A QSAR model has been obtained for 45 compounds, the applicability domain of this model has been determined. I want to know if the number of the compounds in the training set has an influence on the applicability domain of a QSAR model.
I guess it can have an influence, depending on the structures you have in the dataset, and on how you define the applicability domain (AD). If you define the AD a priori, for example if the model was developed from a specific chemical class, then maybe the influence of the number of compounds is not that big because the model is valid only for that particular chemical class. But if you use a posteriori approches to define the AD, which are based on the values of the molecular descriptors in your training set, then I think the number and type of compounds in the dataset has a big influence.
I am afraid that without more specific information on the nature of your QSAR model, the endpoint predicted, the descriptors used and the chemicals in the training set, it is hard to give a meaningful answer.
In general terms: yes, there likely will be an influence of varying degree in many but not all cases. By the way, I just found a good summary on the subject on the internet, which I attach to this post.
Note that the definition of an AD at least implicitly relates to both chemical (structural) and descriptor space. Having more chemicals in the training set in many cases will lead to a refined defintion of the AD in chemical space. Potentially (but I guess not necessarily) it willl also alter the distribution of the training set chemicals with respect to the descriptor(s) used in the model.
Also ADs often will be unevenly covered by the training set chemicals (i.e. some areas of the AD are more densely covered than others, which in my recollection is e.g. discussed in ref. 4 cited in the attached paper). Adding chemicals from the "blind spots" (less or unpopulated areas) will have a greater influence than adding chemicals with properties already present in the training set.
On the other hand, whether as a consequence of extending the training set the predictive performance of your model globally (in terms of specificity and sensitivity) will improve or decline (or not be affected at all) will be determined by the composition of the test chemical dataset used for the validation of these figures of merit (e.g. does it contain chemicals that would be out of domain after the refinement of the AD which would have been in domain before?). Of course, ideally the test set should fully reflect the AD, but it doesn't always do that.
Likewise a prediction for an individual chemical might or might not be affected..
If QSPR/QSAR related to large number of compounds (i.e. it is built up with taking into account large number of compounds) naturally the domain of applicability will be larger (wider) than QSPR/QSAR localized on small number of compounds.