In my research, i need to incorporate exact dot product accumulator in verilog or VHDL to perform accumulation of 32-bit IEEE 754 floating point multiplication output...total 15 multipliers are there...
Thank you madam for your reply madam, I want to implement it to incorporate in my design..i need to accumulate fifteen 32-bit floating point multiplier output to reduce rounding error...only single time rounding i should do in order to improve accuracy
You can implement the accumulation process sequentially as the accumulator used in the arithmetic logic unit in the central processing unit.
You can also process them in parallel by using 16 full adders followed by 8 full adders followed 4 , then followed by two then followed by the final accumulator.
Then you will need five stages. So, in this implementation the time will be reduced to five periods instead of 32 periods for the single accumulator.
There may be more higher speed implementation that may cost more hardware but less time!
There are total 30 features in my dataset, so requires fifteen 32-bit floating point multipliers for FPGA implementation. I think if each multiplier output is added with the previous multiplier output and store it in long interval accumulator for further processing in parallel ...whether it reduces time for data training...