Are there any efficient and effective alternatives to 1d depth-wise convolution which will increase the classification accuracy and also reduce the size of the network i.e. number of learnable parameters ?
I suggest not to use the 1d Depth-wise convolutions. In some cases, the factorized convolution gives much better results with a cheaper cost. For example instead of using (5*5) use (1*5, and 5*1) where the number of parameters reduces from 25 to 10 parameters.
A good alternative is Bayesian parametric modelling framework. Here instead of convolution (integral sum on zero state), there is a discrete sum of products of the probability distributions. However., computing normalizing constants may be a bit challenge. The easiest model is angular central Gaussian or von Misses Fisher distributions on unit hyper sphere or generalized Bingham or Fisher distribution for the matrix case.