After we train a set of data we got the MLP model (that contain the optimize weights and biases with the lowest SSE based on that particular class/target in the training) that can predict an input.
My question is, what will be the last optimum weights and biases of the model if we train the model with many data from multiple classes?
Is it the weights and biases of the last data for every classes or is it the average of the weights and biases for all data or is it the weights and biases of the data with the lowest SSE for every classes. Would love If you can explain more or suggest the resources for me to further understand this problem Thanks