I was wondering if it is possible to replace the loss function of a model with a layer having as activation function the loss function. If yes, how will this impact the network? How will the gradient descent work? How can i obtain the prediction if i do this?

Similar questions and discussions