29 September 2021 1 6K Report

It's very common to use multiple loss. People usually multiply each single loss with a trade-off factor, and take summation of them. Just like the example below (A loss for generator in WGAN-GP).

g_loss = -diff + lambda1 * grandient_panelty + lambda2 * mse_loss g_loss.backward()

So the problem arouse: how do I appraise the effect of each loss, so as to tunning the trade-off factor lambda ? In the above-mentioned WGAN-GP case, the last mse_loss is a custom loss I add in to the total loss. So how should I adjust the factor lambda2 to ensure that the mse_loss would take effect but won't be excessively dominant.

Of course, a hyper-parameter tunning may solve this problem, but I'm searching for a more elegant solution—I want to appraise the effect of loss directly and quantitatively, and set the factor according to the appraisal.

At the first glance of this problem, I used to intuitively think like this:

Ok, I would plot the varying curve of each loss. After comparing the magnitude of each loss, I would assign a larger factor λ to the smaller one to promote it.

But deep thinking, I found it's wrong, makes not sense. Because it is the grad of the loss that really matters, and a simple calculus knowledge told me that, the value of function f(x) has no relevance to its derivative df(x) . Therefore, a loss with bigger magnitude dose not promise a bigger grad back-propagated to the network, and dose not promise a larger effect apparently.

I couldn't make it out, and come to ask you is there any good choice to appraise the loss effect directly and quantitatively? Do I have to print the grad of each loss and analyse them?

More Yp Zuo's questions See All
Similar questions and discussions