Hi,

I am rather new to the whole genome world, and I am still struggling with guessing analysis duration. Would you share your experiences with me and the ResearchGate community about this topic? For analysis on the cluster a walltime needs to be set and therefore a good estimate for the analyses is needed. Important information are:

  • the program (maybe the version)
  • the genome size
  • number of genomes involved in the analysis,
  • the number of used CPUs
  • time, the analysis was running

For example: I started a RepeatModeler analysis for a ~2.4Gbp genome over a week ago. It is still running on a cluster using 64 CPU.

command: RepeatModeler -pa 64 -engine ncbi -database ID_Test1

The next step would be a whole genome pairwise alignment with a chromosome level genome of a similar size. I want to run it with LASTZ or Nucmer.

Experiences and suggestions how to reduce runtime are welcome. :)

More Julia Canitz's questions See All
Similar questions and discussions