Is there any standard criteria for deciding the dimensions of input images we fed to the CNN model's convolution layer or it can be variable according to the audio input.

More Jyoti Mishra's questions See All
Similar questions and discussions