i wrote a shallow water equation code on MATLAB but it is solving sequentially and takes a lot of time so i was wondering how can i transform my code to a parallel one so that it can utilize multiple cores.
I try to understand your problem that you have non-linear Navier-stokes equations for AUV in shallow water environment. while running your code, you want to solve these equations instantly.
1. First, you should solve equations (you may use fluent for example), then construct a coefficient list covering all parameters.
2. Coefficient list is a table that includes all hydrodynamics, hydrostatic etc. coefficients obtained for different swimming conditions.
3. Your code parses this table depending on the dynamics of the vehicle.
You can use the toolbox provided by Matlab (https://www.mathworks.com/products/parallel-computing.html), however an efficient parallelization requires a correct definition of the algorithms.
Actually, it was ... years ago the KAP prepocressor for Fortran did that, if the code were written with no dependencies it was able to parallelize the sequential into a parallel one. I used that many times and it provided also the transformed source code with the OpenMP directives. Then it was acquired by Intel and becomes part of the Intel compiler. At present I don't know if it is developed....
I did this work on a 3D Poisson equation discretized with standard second order equation, using a "coloured" SOR method and I got an efficient parallelized source code that scaled up to 8 CPU correctly. Also for parabolic equation it worked fine. Of course, the main issue is that works only on shared-memory architecture. A more extensive parallelization requires to structure ad hoc the algorithm.
I agree, the starting point is to write efficiently the code.
It would be interesting to see the capability of Matlab in doing that ... however, my interest in numerical simulation of turbulence requires nowdays a larger computational power, I doubt that can be done in Matlab. Even if it seems it works also on GPU...
Well, I have both didactic reasons and some useful statistical tool I developed in Matlab for post-processing. I never really explored the CFD potentialities for research task but there are some books addressing that.
HPE's "The Machine" follow-ons and the Gen-Z Consortium are working on a much larger shared memory fabric system using load/store byte-addressable cache-line granularity memory channel semantics, but you won't be able to use virtual addressing across 100s of computing nodes (no remote snooping of cache). The access to large sections of no-coherent memory would be mmap() and the addressing base + offset, and the overlap halo regions shared using fabric atomics or sharing by an in memory byte-addressable mail system between the sub-grids. Any number of cores on compute nodes should be able to access the memory, modulo some choreography to prevent traffic jams on memory nodes. Progress on Gen-Z by the consortium and HPE is being made ...
Gen-Z ISA is being prototyped on RISC-V (UC Berkeley): http://genzconsortium.org/wp-content/uploads/2019/04/Accelerating-Innovation-Using-RISC-V-and-Gen-Z_V1.pdf
Memory fabric error rates are being reduced by former SGI/now HPE personnel: http://genzconsortium.org/wp-content/uploads/2019/05/Highly-Resilient-25G-Gen-Z-PHY.pdf
What does that get you? No more marshaling, buffering network-sharing across sub-grids. No more marshaling, buffering for those 10 minutes of checkpoint writes every hour to maintain a consistent grid fallback point at a timestep an hour ago. All of that compute time and expensive networks and fabrics hardware goes away. MPI or MPT and OpenMP will still work. Your stencil code and meshes will have to change to be resilient to memory node and compute node failure, but that was foreseeable anyway. Matlab would have to change. But it should scale up and out.