Sometimes researchers applied inferences to samples that were selected intentionally, for many authors there is a mistake. Always the statistic inference is applied to probabilistic sample selection?
I agree with 'YES, in most cases' [Frank B. K. Twenefour]. I would say 'in general, yes".
Let us first define Statistical Inference:
"The theory, methods, and practice of forming judgments about the parameters of a population, usually on the basis of random sampling".
"Statistical inference means drawing conclusions based on data" [Duke University]
Another way of looking at this is: One cannot 'infer' that which is already 'known'. Therefore if the samples are not selected at random, meaning they were 'intentionally selected', then there is no 'inference', but rather knowledge about them to begin with; which caused someone to specifically select the samples in the first place. If this is true then the objectivity of randomness is negated and the inference is less reliable.
Random sampling allows for the deduction and assumption used in inference.
Without random sample one gets poor deduction and poor assumptions.
I hope I have offered a 'reflective' perspective on statistical inference.
Thank you Alain Manuel Chaple Gil, for your question!
Respectfully,
Jeanetta Mastron
On another note: IF anyone is looking for the 'argument' of the answers that are YES to the question, they may find their "NO" answers in the blogging of Alan Downey
"Statistical inference is only mostly wrong" March 2, 2015 - see below link
As Jaap said, there are design-based sampling and estimation methodologies, and model-based (regression) methodologies, but a combination too: model-assisted design-based methodologies. This would be in the area of survey statistics, where continuous data applications dominate. See
Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press. (design-based and model-based combined)
Cochran, W.G.(1977), Sampling Techniques, 3rd ed., John Wiley & Sons. (mostly design-based except for a section under ratio estimation)
Lohr, S.L.(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole. (Comparable to Cochran)
Särndal, C.-E., Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang.
Purely model-based methods can be used when there are good regressor data. See
In any case, stratification is often useful - perhaps even more for model-based methods.
Models can be helpful with small area estimation, often using Bayesian statistics, but not in the case shown here, where we only need to borrow strength:
There is a great deal more on my ResearchGate page:
https://www.researchgate.net/profile/James_Knaub
Much of the energy data here uses my methodologies:
http://www.eia.gov/
Reports there, such as the Electric Power Monthly depend upon this. See sales and revenue for a good example, and similarly under the Natural Gas Monthly. In many other cases, there is nothing else practical that can be done.
A major key is good regressor data for the population.
Cheers - Jim
Article Efficacy of Quasi-Cutoff Sampling and Model-Based Estimation...
Article Using Prediction-Oriented Software for Survey Estimation
Conference Paper Projected Variance for the Model-based Classical Ratio Estim...
Article Quasi-Cutoff Sampling and Simple Small Area Estimation with ...
I noticed that the author, Dick J. Brus, did a great deal of related work with Jaap de Gruijter, available through each of their ResearchGate profiles.
Article Design-based and model-based sampling strategies for soil monitoring
Thank you. I especially found "Design-based and model-based sampling strategies for soil monitoring," by Brus to be very interesting. It is also short, so convenient for a quick look into that application for soil science, which I expect has other applications.
As it was short, when comparing design-based to model-based sampling in this reference, he succinctly made it a tradeoff between "validity" and "efficiency," by which, I think, he essentially, or to a large extent, meant "bias" and "variance," respectively.
In Cochran (1977), Sampling Techniques, Wiley, page 158, he discusses the source of the term, "model-unbiased," which I believe explains why MSE and variance are used interchangeably for model-based inference. But as I once saw in a preliminary paper by Galit Shmueli, I think you can think of model-failure, as resulting in the actual bias that is the result of the fact that a model is never exactly correct. So the estimation of coefficients, and residuals are two parts of the estimated variance of the prediction error, for the model, which can be estimated simultaneously with the prediction. But the bias comes from model-failure, which can only be studied through test data. (I very recently suggested such a use of test data for another question on ResearchGate on omitted variable bias, which would be a reason for "model-failure." For a look at model-failure for a simpler application, where it is generally not a problem, see the paper on that topic, at the link attached. For establishment surveys, which are highly skewed, using stratification when necessary to be sure models are applied to homogeneous, though heteroscedastic categories, model-based inference for cutoff or quasi-cutoff [multiple attributes] sampling is extremely efficient.)
A model-based approach gives you flexibility to use your resources where they are needed, perhaps more so than a probability design, and I think that Brus hints at that, which can also be seen to happen in the attached link to a paper on sample size requirements, particularly for establishment surveys.
Having expended a great deal of effort developing model-based estimates for practical solutions for establishment surveys, I found the parallels with this soil sampling, space-time oriented paper, to be very interesting.
Note that Brus states that "... randomness is introduced via the model..." which is very much like the way it can be interpreted in the usual survey statistics.
One more point: Note that with surveys, the distinction between considering design-based sampling or model-based sampling and considering design-based estimation or model-based estimation is important. You cannot use design-based estimation if you do not have a design-based (probability selection based) sample. When someone uses model-assisted sampling, they generally mean the survey design weights may be modified by essentially estimation techniques, which determine how the survey weights might be "calibrated" (i.e., modified). Here is a good reference for model-assisted approaches:
Särndal, CE., Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang.
Cheers - Jim
Article On Model-Failure When Estimating from Cutoff Samples
Conference Paper Projected Variance for the Model-based Classical Ratio Estim...
I think James have given more valuable references. I supplement some words to supply an application in forestry. In forest inventory and monitoring (James talked about soil monitoring above), both probability sampling and purposive sampling are used in practice. The national forest inventory (NFI) in China was conducted based on systematic sampling, and the statistics or inventory results were published periodically, For the establishment of volume and biomass models applied in NFI, purposive sampling was used to select the sample trees. In other words, the sample trees were selected intentionally, not randomly. Of course, they are not selected as one pleases or at will. Some rules should be followed to serve our purpose. Sometimes, we call the volume and biomass models as "super-polulation models", which means that they can be used in any population, not limited in a population for a special probability sampling. They can be used in various scales, from regional, provincial levels to county-level, even to stand-level. Due to these reasons, the prediction errors of volume and biomass models should be much lower, for example, less than 3% or 5%.