Hi,
I am rather new to the whole genome world, and I am still struggling with guessing analysis duration. Would you share your experiences with me and the ResearchGate community about this topic? For analysis on the cluster a walltime needs to be set and therefore a good estimate for the analyses is needed. Important information are:
For example: I started a RepeatModeler analysis for a ~2.4Gbp genome over a week ago. It is still running on a cluster using 64 CPU.
command: RepeatModeler -pa 64 -engine ncbi -database ID_Test1
The next step would be a whole genome pairwise alignment with a chromosome level genome of a similar size. I want to run it with LASTZ or Nucmer.
Experiences and suggestions how to reduce runtime are welcome. :)