I am training machine learning models to predict the binding affinity of small molecules against a protein receptor. During the training set preparation I remove very small and very big moelcules, as they are more likely to binding non-specifically to the receptor and abolish its funtion. Currently, I use z-score thresholds (-2.5 for the very small and 2.5 for the very big molecules). However, the z-score depends each time on the distribution of MW in the available data and hence I believe that actual MW lower and upper thresholds would be more accurate. According to your experience, what should be these MW thresholds?
PS: please don't point me to Lipinski's rule of 5 (180 to 500) or some related rule. When training a machine learning model on known data you have to be more "generous" otherwise you will discard lots of precious experimental data.