Recently, I have been attracted by the paper "Stable learning establishes some common ground between causal inference and machine learning" published in Nature Machine Intelligence Journal. After perusing it, I met with a problem regarding the comparison between transfer learning and stable learning:
The authors compare stable learning with transfer learning as shown in Fig.3 in the paper. Since I am not quite familiar with the performance of SOTA transfer learning techniques, I would question that in the real-world scenarios where the distributions of testing data are known, if we can use transfer learning to maximize Acc_n for each n as much as possible, we are sure to achieve better Avg(Acc_1...n) than stable learning. Does it mean that stable learning is not so practical and useful in this specific scenario?