The square-shaped kernels are preferred because the pattern could occur both horizontally or vertically and by using square-shaped filtered both patterns can be extracted, this sort of ensures symmetry. In some cases with prior assumptions and knowledge of the geometry of the image nonsquare kernel can also be used.
"Indexed operations for non-rectangular lattices applied to convolutional neural networks", Jacquemont, Mikae and Antiga, Luca and Vuillaume, Thomas and Silvestri, Giorgia and Benoit, Alexandre and Lambert, Patrick and Maurin, Gilles.
"4d spatio-temporal convnets: Minkowski convolutional neural networks", Choy, Christopher and Gwak, JunYoung and Savarese, Silvio, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
Square kernels are used as we don't have any preference in looking for patterns in horizontal and vertical directions. That is why the network is kept symmetrical while looking for features.
The use of asymmetric kernels can be found in inception nets where n*n kernel can be thought as n*1 and 1*n kernels. In effect it has the same receptive field as n*n kernel but utrilizes less no of parameters thus being more efficient.
In comparison to rectangular matrices, square matrices have very beautiful computational properties., e.g., symmetry, closure properties w.r.t multiplication, addition etc. See e.g., Article Matrix Comparison, Goodness-of-Fit, and Spatial Interaction Modeling