depending on the diversity of the training set and the resulting descriptor values, you will get an applicability domain for your QSAR model. If the investigated structure is too distinct from the descriptor space of your model, you won't be able to get a reasonable result for this structure with this specific model.
It depends on what type of QSAR studies you carry out. In 2D QSAR you can take structurally diverse compounds tested on the same target. In 3D QSAR alignment dependent or template dependent descriptors are necessary and you need analogous compounds, but you can test others too provided they are properly aligned while generating 3D descriptors. In 4D or in some newer QSAR methodologies you need to generate the field based descriptors or conformational ensembles in molecular dynamics or the the interaction energy based descriptors. We can use structurally analogous compounds or diverse compounds for single target. So in my opinion, data set of structurally analogous compounds usually gives very good QSAR models. If the data set constitutes structurally diverse compounds it should also give good QSAR model provided the kind of QSAR method is correct and the biological activity data is proper. We can justify outleir in QSAR model in many ways. I don't know but if we carryout 2D QSAR on phenols for their antimicrobial property, 2-naphthol would certainly one of the outleir because of its different 2D properties.
depending on the diversity of the training set and the resulting descriptor values, you will get an applicability domain for your QSAR model. If the investigated structure is too distinct from the descriptor space of your model, you won't be able to get a reasonable result for this structure with this specific model.
Those compounds possessing descriptor values that do not appear among the
remaining structures of the data set are very likely to become outliers because the regression equation will fail to fit these values. E.g. If only one compounds contains a halogen then this structure is likely to appear as outlier. Therefore your data set should be not only similar regarding structural issues, but also similar with respect
to descriptor values. The same problem appears if molecules in the test set are
too different compared to the training set, causing much lower values for r2.
One way to take care of these issues is to do a cluster analysis of the descriptor
values including the activity: Use the molecules being centroids of clusters for the
test set and all others for the training set. Thus your test set consists of "typical"
molecules that cover the whole range of the acivity.
If it is "allowed" to mix diverse compounds depends on what you would like to model.
If you do QSAR on some off-target in the field of ADMET then you probably will have diverse structures and it will be ok though the model will not be as good as fro narrow series based on figures of merit. This is mainly because the interactions to these targets are rather unspecific.
If you want to model specific target interactions, then you should be sure that the molecules either address the same target features in descriptor based models or are properly aligned for field based methods like comfa. Tripos just published a new flavor of this called template comfa which allowes to mix prior knowledge about core alignments and there topomer comfa fields.
For all QSARs the same rule applies. Features that are not available in your training set will also not be predictable, the models do not extrapolate.
And: as long as you are just interested to create a model well- behaved on your test set, the strategy with the cluster centers or similar approaches work well. In real life in industry your model has to be tailored to predict the future well. Read through some Astra Zeneca publications about this topic, their concepts work quite well.