Hi,
I trained 32 features with sparse filtering for the MNIST data set. My idea was to use the weights to initialize the first convolutional layer for my deep network which is based on the deep MNIST tutorial from Tensorflow.
My problem is, that the pretrained networks starts slightly worse and ends slightly worse (98,7% vs 99,35%) than initializing with random values like in the tutorial.
The pretrained features looked promising in my opinion. See the attached file.
My question is, if I have errors in my code. The features have been saved with numpy:
np.save("weights", weights)
with a shape of (32, 25) which represents 32 features with a dimension of 5 * 5.
The problem is, that the weights of Tensorflow expect a shape of (5, 5, 1, 32).
So, this is how I initialize the first layer with the weights:
def get_pre_trained_weights():
pre_trained_weights = np.load('weights.npy')
pre_trained_weights = pre_trained_weights.astype(np.float32)
reshape_weights = np.ones((5, 5, 1, 32), dtype=np.float32)
for row in range(32):
reshape_weights[:, :, 0, row] = pre_trained_weights[row, :].reshape(5, 5)
return reshape_weights
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
pre_trained_weights = get_pre_trained_weights()
W_conv1 = tf.Variable(pre_trained_weights)
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
Any ideas why I am not only better than random initialization but also slightly worse?