I am using Tesla K20. I got an error that the shared memory is limited to 16K although K20 supports up to 48K. How to configure the GPU and NVCC compiler to use 48K shared memory instead of 16K?
The Tesla K20 is a compute-capable 3.5 device. Compile your applications using "compute_35, sm_35" as compiler directives in MSVC or "-arch sm_35" as a command line directive in nvcc. Using 3.5 will open up the possibility of leveraging multiple compute streams on your device.
It also seems that you're using an old version of the CUDA toolkit. Since version 7.0 (or maybe even 6.5, I don't quite remember now) compute capabilities 1.0 and 1.1 are deprecated and the toolkit uses 2.0 as the default setting