I want to divide a single program into processes which can then be sent to multiple machines for processing, in a LAN environment. Any help or links will be appreciated.
Bag of tasks - where there several different tasks that you would like to solve/simulate
Domain decomposition - where you have a huge problem to solve/simulate and you would like to divide it into pieces, and send then to different process, for each step involved in the simulation you need to make the simulation compatible with the original problem (normally send the boundary values between process).
PVM is rather archaic, but MPI is its modern successor, which has been around for about 18 years now. Other parallel frameworks for distributed parallel programming include Charm++, Intel's Concurrent Collections, and HPX
It is difficult to answer your questions, because it is too open. We need more detail about the problem and the numerical methods that you want to solve.
In my case I work with the wave equation solved by Finite Difference Methods... If would you search this words on google it will apear several links... I have learned about this with some tutorials on line (I'm sure that there are good materials in the Intel and NVidia websites)
For doamin decomposition you will need a library for comunication between process, as far I know the MPI is the most used library.
The basic idea behind the domain decomposition is that you need to split your domain into to different process and even though still solve the entire problem... If you need some references with Finite Diference I could try to upload some of my previous articles. Let me know if you are interested.
Thanks, and sure i ll ask you if i come across Finite Difference.
Okay, right now i am looking for techniques like Phil mentioned. My idea of project is creating an environment in my university to provide for simulation to run its instances on multiple machines .
Say, i have a large simulation to run which will require around 4-5 hours or more for various parameters. I want to divide this into separate machines such that individual parameter is calculated on single machines and then later the result can be united. There is not much of dependency on each other. Similar Unix/Linux like environments shall be available on all the machines .
So, i was wondering whether this is achievable and what and how should i move ahead? i hope i am not being naive or not clear.
An excellent book on these topics is the following book:
Mattson, Sanders, and Massingill, "Patterns for Parallel Programming", Addison-Wesley, 2004.
I believe you may find it in google books (else look in your local library). Keep in mind though that while it covers all major techniques for domain decomposition, task decomposition, divide-and-conquer, etc. and also all major styles of parallel programming (message passing as well as shared-memory thread programming) it does require a maturity in programming. Many examples are found there, mostly in C/C++ and Java (for multi-threading). Some FORTRAN code can be found as well. It even includes techniques for exploiting GPUs for doing massive numeric operations in parallel.
Regarding parallel programming frameworks, in distributed memory environments such as the one you have in mind, besides PVM and MPI (and MPICH), and OpenMP, there is also Apache Hadoop (mostly used by Java programmers though), Apache River (implementing the Linda system for parallel programming: a distributed shared memory protocol), and some older but very robust systems such as Wisconsin Condor, DCP, etc. And of course, there are also very strong distributed memory message passing systems for .NET.
Thanks @Ioannis Christou, after going through the topics , i have realised that what i can do is work on dividing the jobs(simulations per machine) dynamically .
Suppose for 100 simulations and 10 workstations , initially 5 jobs are assigned to each and then maybe i can schedule it in a way where the workstation which completes the simulation earliest can be given the next 5 queued jobs and so on for 100 . Atleast begin this way as proof of concept and then work my way through enhancements and modifications.
The book you have mentioned is great and i can surely use it for adding various things in the project. Thanks.
Dear Atif, There's two different parallelization granularity: coarse grain and fine grain. Fine grain will need to execute in parallel small amount of jobs, it's where you can expect the better load balancing strategy. Coarse grain is what you intend to do using bigger execution units, advantage: much easier to implement, drawback: in case of program with long execution time, load could be unbalanced and also resource like memory isn't distributed.
Whatever is the model of parallel programming you will use (MPI for example), you must first determinate what parts of your program are independent, and therefore can be parallelized. This will allow to compute what is the percentage of you program which can be parallelized, and if it worth the work. You must never forget that parallelization implies synchronization, and synchronization implies a cost that do not exist in your original program, Then, if your program has a low rate of independant parts, it is probable that all the gain you will obtain parallelizing the code will be lost in synchronization costs.
to answer your question technically, you can use any client Server type of platform. for example WCF (Windows communication Foundation) where the services act as a piece of code (the method). these in WCF are services with methods, now methods will take parameters, which means the data can be passed as parameters. try having the same services with methods on multiple machines.... this will closely simulate your requirement.
Suppose for 100 simulations and 10 workstations , initially 5 jobs are assigned to each and then maybe i can schedule it in a way where the workstation which completes the simulation earliest can be given the next 5 queued jobs and so on for 100
An easy way to do this, if you have administrative rights on the machines, is
install a resource scheduler, like SLURM or Torque . Then just submit
all 100 jobs, writing results to a file. These 100 jobs will slowly wend their way through the pool. After they are complete, run a job which combines the answers.
I have done this with a lab of machines which closed for the night. Jobs would
queue up during the day, and at 5pm, the queues would be started. Jobs
would be allowed to start only if the time limit specified on the job ensured
that they would be complete before the lab opened the next day. This was a
great way to utilize computer power which would otherwise have been sitting idle.
To parallelize a process, you need to understand which part of the process can be done independently or synchronously other. These processes can be mapped into different processors to be processed separately. The outcome of these processes can then be reassembled and finally the processed can be completed. The hadoop Mapreduced can help you to program a parallel process with ease.