Suppose that we have a random vector X (with m random variable) and Y(with n random variable). and Y=W.X (single layer neural network). I'm going to minimize the KL divergence between X and Y by gradient descend method. When m=n it's easy and it can be calculated by Jacobean matrix but if m # n, I don't know how to solve it.