I have experiences in CPU and GPU/CPU MD simulation of protein/DNA systems with three codes:
Gromacs, NAMD, and AMBER. AMBER is surely the fastest one as far we have a single node with 1-4 GPU and we use a ratio of 1CPU per GPU, too bad AMBER11 or AMBER12 are not freely available, and maybe are a little bit biology-oriented. The single node performance of Namd is lower but scales well to a high number of nodes (with Infiniband connection) and doesn't have any restriction on the #CPU/#GPU ratio, moreovr Namd is freely available. Gromacs GPU implementation is not very mature and can give only a x2~3 speed-up compared to pure CPU. Depending on your system and your hardware, the choice is between AMBER and NAMD.
I would suggest to use LAMMPS from Sandia Labs (http://lammps.sandia.gov).
Its not just a widely applicable MD simulator, you can also explicitly control which task (calc. pair interaction, K-space, etc.) will be executed on CPU or GPU.
Of course it is also capable utilizing OpenMP/MPI for parallel job execution.
In addition to Frank´s suggestion, I would add the recent development on OpenMM which provides GPU Acceleration for CHARMM. Here is the code and documentation
If you ask for an electronic structure calculations program, then go for TERACHEM,
the first program written from scratch for GPUs and for electronic structure calculations, highly recommended and amazingly fast, I have used it and published my results.
As Frank said LAMMPS has this GPU support but it has to be specified if each command/instruction will be executed using GPU or CPU. In my opinion, leaving this to the user does not contribute to optimise the efficiency of the code, but on the other hand you can test a large number of possibilities and see which one goes faster in your machine.
If you do not want to bother with testing/tuning, my choice would be DL_POLY. Fast, robust, easy to use, flexible, and good support.
@German: I agree partially, it depends on the level of background knowledge of the user due to complexity of algorithm and scaling behavior of the respective system. ;)
Concerning NAMD, as far as I know it can work either on several CPU or on *one* GPU, but it is neither possible to use several GPUs nor to mix GPUs and CPUs.
Gromacs has come out with a beta release of version 4.6 which can do CPU/GPU simulations. If I understand correctly, then it can do the non-bonded parts on GPU and the bonded energies on the CPU making it quite fast. For protein simulations, for 187 amino acids (roughly 1714 atoms), I was able to get 53 ns/day using a 16-cores CPU and a GTX 670.
Dear Nino, the answer depends on which kind MD you wants to use. On the side of ab-initio molecular dynamics and plane waves the quantum espresso code is a good alternative. This is easy to use and to understand how it uses CPU-GPUs.
For molecular dynamics I propose you 2.9 NAMD code. Concerning NAMD you can use as many GPU processors as you want. NAMD is able to calculate on multiple GPU node or many nodes (with one or more GPUs) but they need to be connected trough infiniband. However, NAMD calculate only electrostatic part trough GPU so CPUs are used for other calculations and for nodes communication. Calculation using an 8CPU node with 2 Tesla GPUs executes in a time corresponding to use of about 40CPUs only. The gain is thus important but not excessive.
Why you need CPU/GPU mixing, you are worry about the SP calculations? You can achieve much better sped-up if all calculations is performed within the GPU. Amber 12 is the best MD GPU software and most importantly was very well tested by people that know what they are doing.They use a mixed SP-DP model. This is the fastest GPU code which is faster than NAMD. It fact the ACEMD is faster because they use 4fs step, but I can't trust on their approach. Thus if you don't want to make compromise with the precision just go for Amber. A new benchmark has been published recently on their web:
I have experiences in CPU and GPU/CPU MD simulation of protein/DNA systems with three codes:
Gromacs, NAMD, and AMBER. AMBER is surely the fastest one as far we have a single node with 1-4 GPU and we use a ratio of 1CPU per GPU, too bad AMBER11 or AMBER12 are not freely available, and maybe are a little bit biology-oriented. The single node performance of Namd is lower but scales well to a high number of nodes (with Infiniband connection) and doesn't have any restriction on the #CPU/#GPU ratio, moreovr Namd is freely available. Gromacs GPU implementation is not very mature and can give only a x2~3 speed-up compared to pure CPU. Depending on your system and your hardware, the choice is between AMBER and NAMD.
Nino, great question. I asked my friend at SFSU what he thought about these opinions and this is what he answered: "It was helpful to know that my choice of Amber is one of the best. NAMD requires infiniband, which is expensive. I'd rather have my machines with a eight-core processor and two Nvidia cards, which are cheap, and not have to use the network at all. Besides Nvidia cards, the machine costs about $ 700, with 16 GB of RAM. There is also no need to buy Tesla cards, ordinary cards are sufficient and cost much less. I bought a GTX 580 at the end of last year for $ 500, and this card beats in speed my Tesla C9050, which cost $ 6,000. The new generation architectures based on Nvidia Kepler, are also cheap. In addition, the people of Nvidia / Ross Walker, UCSD, are experimenting with a way to make MD calculation using whole numbers rather than real, and this way you are able to get even higher speeds with commercial GPU cards. I have yet to read the article and understand it - I dare not do calculations with this method, using integers, because it is too new to know its limitations, but other standard MD methods, such as Amber, with CPU-GPU are opening up new lines of inquiry."
In addition, we wait for the new and much faster GPU very soon (the name is not clear yet, GTX780 or Titan) that will be the Tesla K20X desktop version. It will be around 2x faster than GTX580. Personally I wait for this card to upgrade my PC's, which are equipped with GTX580 3GB GPU's.
I suggest to be careful on the card choice... even if the peak performance of commercial cards designed for gaming (namely GTX cards) could be higher, they are not designed for very intense and prolonged (in the order of several days) calculations. Moreover the on-board memory is usually lower (1-3 GB) when compared to the Tesla or Kepler cards (3-6 GB) limiting the maximum number of atom you can simulate.
In theory you are right but I've worked more than 2 years by GTX cards and never had a problem with the longer simulations. Moreover the memory needs was dramatically decreased by the last Amber 12 patch. For instance for a 60K atoms you need only around 0.3-0.5 GB of memory. Thus if you not simulating several millions atoms system, you don't need 6GB. Majority of the people use GTX cards too (see Amber list) and even Ross Walker suggested this much cheaper solution." I believe, unique to Exxact can feature GTX680 cards providing stunning performance for extremely reasonable prices." More information about official Amber teem suggestions here:
http://ambermd.org/gpus/recommended_hardware.htm
For me buying Tesla is just waste of money. However we have to wait and see what Telsa K20X desktop version will be released, if they release such card..
It seems no one has tried Acellera's MD package, ACEMD. The software is faster than Amber and they also sell hardware optimized to run MD applications.
You can also try Terachem, which has Born-Oppenheimer and classical molecular dynamic and has capability of using both CPUs and up to several GPUs. So far as I know the scalability of the calculations is quite good.