My question stems from the fact that single precision algorithm are vastly faster in GPU

implementation. I would like to understand which part of an optical FDTD would be critically affected by switching from double to a single precision computation.

I would like also to understand whether would be convenient or not (computational wise) to perform some part of the computation in double and some part as single precision float and storing the data in a single precision (my understanding is that under CUDA for example switching from single to double can be performed simultaneously on large data using SIMD instructions).

In the end, GPU hardware is still limited in terms of available memory (assuming that a transfer to the central memory is unpractical for many fast applications), in addition NVIDIA for example strongly limits dual-precision computation in Mid-range hardware, applying an expensive bonus for double-precision enabled hardware.

More Marco Peccianti's questions See All
Similar questions and discussions