I am implementing raddix-4 DIF generic algorithm (N = 64,256,1024,4096,16384) on xilinx virtex-6 fpga. I have got two schemes for the twiddle factor generation for that.
1) generate sin and cos by cordic and store it in memory and call it in the process to multiply with the addition of inputs. In this I can do parallel tasking to increase speed but multiplication increases vector length drastically.
2) use cordic rotation for complex multiplication. This will not increase vector length but will reduce my speed. Xilinx CORDIC ip core takes phase input in 2QN formate but my maximum multiplication (2*π*k*n/N) answer lies in 3QN formate (around -4.71). So how can I solve this problem?