Hello. As far as I know, the derivative of a scalar, vector or matrix with respect to a matrix is not a new idea. A well known reference is Magnus and Neudecker's "Matrix differential calculus". I have also added a link to a paper where this type of method is used to analyse a nonlinear least squares algorithm. Some ideas are summarised in the appendix to this paper. The main trick is to make sure that basic operations are preserved by the differentiation rule (chain rule, etc.) so that more complicated problems can be tackled.
Article Analysis of a Nonlinear Least Squares Procedure Used in Glob...
The difficult became when matrix is not squared, in the example that I placed before it was solved by avoiding to divide by the non squared matrix, but it is not always possible.