When training a Radial Basis Function Classifier for classification problems, we generally train the model by making use of the pseudo-inverse of the design matrix, minimizing the MSE error function, resulting in a global minimum distribution of weights for the given radial basis function parameters.
However, commonly, classification tasks are conducted using a softmax layer and a log-loss function.
When approaching RBFNN training through gradient descent, we no longer are restricted to use the MSE. As I wasn't able to find any research on this, I was wondering if there was an advantage of using a softmax layer and a log-loss function in this case or if it would yield similar results to the linear regression approach.
Generally, logistic regression should cope better with classification noise than the linear regression approach, is this correct?