The data in your problem seems to consist of seed yields of soybean and known genotypes of SNP markers for individual plants. A common method is the usual regression of seed yield on marker genotypes. Normally one does this analysis one marker at a time but this is not efficient as the possible linkage disequilibrium between the markers is ignored. When several markers are considered simultaneously one uses the multiple regression approach provided the number of markers is not too many to lead to over-fitting. With hundreds of thousands of SNP markers, as is the common practice, one has to use sparse regression methods like ridge regression or lasso which involve penalizing the regression to make the regression coefficients of irrelevant markers tend to zero. A detailed description of such methods can be found in the author's paper on 'Linear Models in Genomic Studies' available in Research Gate.