I'm training a neural network (E3D-LSTM) for spatio temporal video prediction. It's quite a big network that has millions of parameters. The model is training perfectly when using my 12 CPU cores, but when assigned to my NVIDIA GTX card, an OOM error stating that the "tensor size is too big to be assigned" is throwing up. I tried reducing the the batch size, but the error still persists. I'm surprised that CPU cores had no memory issue for such a big network, but GPU is not able to train the model. Can somebody throw some light on why this is happening and what can be done to fix it?. Please find attached the log-file for the complete error log.

Edit: The model is training on GPU for very low batch sizes, but not able to for reasonable batch sizes.

More Vishal Hariharan's questions See All
Similar questions and discussions