Hi,

this paper

https://arxiv.org/pdf/1502.01852.pdf

suggests to initialize weights with ReLU and PReLU activations differently. As far as I understand, I initialize the weights of the first layer with

model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3),

                                activation='relu',

                                kernel_initializer=RandomNormal(stddev=np.sqrt(1/(img_rows * img_cols))),

                                input_shape=input_shape))

and beginning with the second layer with:

model.add(Conv2D(64, (3, 3), activation='relu',

                                kernel_initializer=RandomNormal(stddev=np.sqrt(2/(3*3*32)))))

See equation 10 on page 4.

For the first layer they write:

"For the first layer (l = 1), we should have n1 Var[w1] = 1

because there is no ReLU applied on the input signal. But

the factor 1/2 does not matter if it just exists on one layer.

So we also adopt Eqn.(10) in the first layer for simplicity."

But sadly this performs slightly worse on MNIST and my data set.

On MNIST this special initialization the hit rate begins with 0.9655 and has its peak on 0.9895.

With glorot uniform (Xavier uniform) it begins with 0.9763 and has its peak on 0.9905.

This is reproducible. Did I get the formula wrong? Is everything implemented right?

Similar questions and discussions