How credible is a docking score to measure the affinity for a particular receptor?
Does higher score mean better affinity? It doesn't explain for how long the ligand is bound to the receptor, then how can it be used to explain potency?
Scoring functions, by default, have no idea about the unbound state of the ligand and it's target.
So binding free energies can not be reliably considered based on Docking score.
it makes more sense when we use docking scores just for classifying active ligands from in-actives, but further more we should consider binding assays.
Docking gives you the protein-ligand complex, where ligands get bind in the same active site as predicted experimentally or any predicted active site in case of Homology modelling. You need to go for in-vitro studies to check the potency.
However, you can also perform molecular dynamics to check the stability of the protein-ligand complex
There is no one to one correlation between docking score and the corresponding binding affinity. Moreover, the suggestions provided by Mohammad and Anu are recommendable.
Most docking scores are predicted values of the free energy of protein-ligand binding, aka. affinity (most often expressed in kcal/mol units). They are not meant to be accurate affinity predictors. The intended use of docking scores is in virtual screening, where they are used to rank docked compounds for subsequent procurement and experimental testing of top scorers. In general, 0 to 10% of experimentally tested virtual hits turn out to be true (modrately potent) binders to the protein of interest.
This is the eternal problem with docking scores. You can find several papers about this topic. Dmitri comments are right. The experimental hit rates of virtual screening is very low (1%). This is the reason why some groups not only consider one score or criteria. It is very common the use of 2 or 4 scores at the moment to set the top compounds to be experimentally tested.
There is much room for improvement of scoring functions given that a null model using sole clogP to rank compounds appeared among the top scoring methods in the challenge bellow.
Gaieb, Z., Liu, S., Gathiaka, S., Chiu, M., Yang, H., Shao, C., … Amaro, R. E. (2018). D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. Journal of Computer-Aided Molecular Design, 32(1), 1–20. https://doi.org/10.1007/s10822-017-0088-4