Dear, I have an issue with MobileNet v1
From the architecture table of the first MobileNet paper, a depthwise convolution with stride 2 and an input of 7x7x1024 is followed by a pointwise convolution with the same input dimensions, 7x7x1024.
Shouldn't the pointwise layer's input be 4x4x1024 if the depthwise Conv layer was stride 2? (Assuming is the padding of 1)
Is this an error on the author's side? Or are there something that I've missed between these layers? I've checked implementations of MobileNet V1 and it seems that everyone just treated this depthwise layer's stride as 1.