I am trying to optimize my simulator by leveraging real-time compilation. My code is pretty long and complex, but I identified a specific __device__ function whose performances can be strongly improved by removing all global memory accesses.

Does CUDA allow the dynamic compilation and linking of a single __device__ function (not __global__), in order to "override" an existing function?

Additional information:

- The function is a normal __device__ function.

- It is not part of a class nor structure.

- The difference is not the data type, so I cannot rely on templates.

- I actually must change the calculations performed in the function (i.e., propensity calculations) according to the model that I am simulating.

Thank you very much indeed for your answers

More Marco Salvatore Nobile's questions See All
Similar questions and discussions