If you want to use GPU for simulation, then buy Nvidia Tesla. Cards like GTX or Quadro is nice for testing and debugging purpose. Simulations may run continuously for many days and weeks, which may affect the hardware. GTX or Quadro are not made for heavy continuous use, while simulation is long and computationally expensive process.
Speed depends on the number of cores/processor. On intel i5 2400 3.1 GHz quad core processor, GROMACS4.5 can give a speed of ~8 ns/day on ~30,000 atoms system at standard simulation parameters. Xeon E5 1650 has 6 cores 3.2 GHz processors. I expect, Xeon E5 1650 will give ~12 ns/day or more simulation speed for similar system size. Also, GROAMCS4.6 is faster than GROMACS4.5.
If you want to speed up your simulation using GPU, then go for NVIDIA Tesla because other cards like GTX or Quadro are not made for 24x7 calculations.
Hello Dear Rajendra Kumar, thank you for answering my question. its very relevant answer to my question. I was thinking to purchase the same Xenon 1650 you mention above. As you mention Quadro are not made for 24x7 , its means? actully we are planing buying quardo 4000 2GB graphics card
If you want to use GPU for simulation, then buy Nvidia Tesla. Cards like GTX or Quadro is nice for testing and debugging purpose. Simulations may run continuously for many days and weeks, which may affect the hardware. GTX or Quadro are not made for heavy continuous use, while simulation is long and computationally expensive process.
Probably this answer is too late for you, but I would like to add some more to Rajendra's answer. These are based on my benchmarks while running simulations on Gromacs 2018 or higher:
For same number of cores, I have found Xeon cascade lake processors to work better than their AMD threadripper counterparts. However, in comparable price, the new 7nm arch AMD EPYC processors are far better than any Xeon ones, both on single and multiple threads. Given now most acceleration always involves GPUs, I would say use NVIDIA graphics card, as Gromacs has a better cuda scheme than openCL. That being said, I haven't actually used AMD GPUs in any combinations for MD runs so you might want to test it out before finalizing a GPU, AMD alternatives are really cheap in comparison to NVIDIA ones.
If your simulation box is not too huge, I would ask you to not buy a workstation grade GPU as you will definitely not require as much of memory cache. For example, for a mdrun with 4 MPI ranks and each containing 4 openMP thread running on 16 Xeon skylake processors, if I use a Tesla T4 GPU (with 16GB cache, which is too much) for a simulation box of size 10nm*10nm*30nm, it just uses upto 3 GB of RAM and 350-400MB of GPU memory, while all CPU and GPU cores running at n*100% capacity. What makes a difference then is the number of cores, as Rajendra has previously mentioned, along with core clock speed (both CPU and GPU) and SIMD architecture.
My recommendation: Buy an AMD TH with 12cores/24 threads with RTX 2080 super/Ti and just 12 GB RAM. That's probably more than enough and much better than your previous options.
I have found that though higher frequency helps per core, a lot of time is lost in as GPU wait time and redistributing processes after every n steps that you put for in your mdp file. In such cases, it would be better if you also look for a better SIMD architecture. For example, even though new AMD TR processors have a AVX256 arch compared to a cascade lake Xeon processors, the higher end models of which have 1 FMA unit too, the communication between GPU and Avx256 is seen to be faster, plus the maximum number of cores that you can get in AMD threadripper or (if you can wait for some time till it launches) EPYC rome proc workstations of 64cores/128threads and 128cores/256threads is much much higher than that of Intel Xeon processors, I would suggest you opt for that. i9 10k processors are not built to work at full capacity for 24 hours.
About Molecular dynamics, I see that what becomes a bottleneck in simulating small to medium size simulation boxes (100*100*100 cubic Angstroms) is the wait time, and simulation time per frame or cached frame is never the slower step. I would suggest you go for more number of cores that can distribute GPU processes well with higher SIMD arch.
P.S. Nvidia RTX Quadro is outdated now. For virtual screening unless you are using Turing cores for AI like Atomwise does HTS lead discovery, if you have money, buy two 2080Ti, or better yet, just wait the end of the year (I hope) and buy a 3090. They are almost 4x cheaper and with a much higher performance as they (30 series) are based on 2nd gen RTX architecture. I think for HTS, its sufficient, and for any purposes, the upcoming ampere architecture GPUs will be a win-win.