for a small number of nodes, I would not bother with openstack or cluster/cloud infrastructure. just install the same distro on all nodes, configure them to mount a shared filesystem (as /home, for instance). you may not even need a scheduler (depends on whether you have multiple users, or often have backlogs of jobs.)
sequence processing codes are often quite file-intensive, so I would think about making sure your storage infrastructure is solid and fast first.
You cannot simply connect those four machines into a single shared memory machine (at least not without some expensive hardware, and even then the performance would be disappointing given the I/O lag across a shared architecture cobbled together like that).
If you meant a beowulf type parallel computing cluster, even then you really need to find yourself a good UNIX cluster sysadmin. The process of setting up a cluster, and the batch processing environment necessary for it, is a fairly involved procedure. Well beyond what could be explained on a forum discussion board.
Small compute clusters (for e.g. head node and 3-5 compute nodes, plus usually some RAID storage, complete with switch) that come pre-configured are actually quite reasonably priced, especially if you do not have the UNIX admin expertise in house to build one from scratch.
A cheap alternative might be to set up each machine as stand-alone UNIX boxes, and look into some of the open source or commercial grid computing solutions to use those four machines for distributed computation.
Rephrasing my question, I need a system with the specifications mentioned above, within a week. And I don't have the slightest idea about clustering. Will it advisable to to use a web-based service or is there any cheaper way to fulfill my needs?
A web based system is nice in that all of the administration is handled by others for you. And the price may be very attractive especially if this is a compute cluster you only need for a finite or short time to complete a discrete piece of work. Buying a small cluster makes more sense if it is something you'll be using daily/weekly for several years.
The only potential limitation with a web based service is how much data you need to move back and forth, and how much bandwidth you have for that.
But certainly if you need something up and running quickly, have no admin experience or access to dedicated sysadmins, then leasing a cloud based cluster service may be ideal.
They have a "free" version, that is somewhat limited that might help you solve your problem. I have no financial interest in the company, nor can I vouch for how well the product works.
I guess I'd also say I am a bit confused about exactly what you are looking for. Do you want a large shared memory resource (i.e. a single, multi-core machine where all cores share access to a single large bank of memory), or do you really want a multi-node parallel compute cluster?
You can lease either one from numerous cloud computing companies, but they are different things for different applications.
Michael, I want a multi-node parallel compute cluster for ngs data analysis purposes. Do you have suggestions for any cloud-computing company that is budget-friendly and efficient?
If this is specifically for NGS, then contact your NGS sequencer representative for your area. All of the companies, as far as I know (Illumina and LifeTechnologies for certain) actually offer deals on cloud-based clusters pre-configured for their sequence technology. I don't actually know who they contract with for the cloud compute resources (although I think ABI in the USA used to or does use IBM cloud services).
The advantage of going through them is they'll provide a virtual cluster already set up with much of what you'll need or want for your NGS analysis pipeline. And you'll be able to get technical support on the analyses tools through them.
I honestly have no idea on price though. We bought our ABI SOLiD system several years ago and went with a physical cluster instead of a virtual cluster (ABI sold their physical clusters pre-configured, shipped directly from Penguin Computing). Our ISP's bandwidth was the deciding factor for us - simply not enough (at that time) to make a virtual machine practical for our data needs.
As of note, in case you still want to find a local solution using your own hardware: i3 processors are severely limited in the maximum amount of main memory (RAM) to be addressed, which is why you need to use AMD or Intel i7 processors. In addition, some mainboards restrict it to less 32GB - the HP Z400 may be one of them - even when a server-i7 processor is installed. With certain consumer-grade mainboards I managed to to use 48GB of RAM per machine.
Thanks Vishal, that was also my favorite choice for a single-workstation. Please note: The only dual-socket mainboard I could find compatible with AMD 6300 processors "out of the box": ASUS KGPE-D16. Check ASUS webpage to be sure. Be aware that if using 16 ram modules they NEED to be "registered", ie DDR3 ECC registered, otherwise you only can use 8 "normal" ddr3 modules with a max. capacity of 8 Gb each. Nowadays, there is hardly a price difference between the server-grade, registered modules and normal ones.
for a small number of nodes, I would not bother with openstack or cluster/cloud infrastructure. just install the same distro on all nodes, configure them to mount a shared filesystem (as /home, for instance). you may not even need a scheduler (depends on whether you have multiple users, or often have backlogs of jobs.)
sequence processing codes are often quite file-intensive, so I would think about making sure your storage infrastructure is solid and fast first.