Since I am using the Random Forest model in R to predict LST (land surface temperature) from actual LST, I realized that model performance is better in lower Average LSTs, thus I wanted to extract the probable effects behind this phenomenon. one idea that I come up with is spread of the dataset. In other words, I thought that a lower standard deviation in lower surface temperatures would cause a better performance in the RF model since the probable violation of variables would be reduced. However, I am trying to confirm such kind of claim within the literature. I really wanted to know how the spread of data would affect the final performance of a machine learning model. For example, lower STDEV leads to better performance? if so, how can I find relative literature for this?