Each statistical test has certain conditions, which when met, are sufficient to carry out the test. The sampling technique is not one of the conditions.
While inferential statistics like correlation and regression are generally associated with probability sampling, they can be used with non-probability sampling methods like convenience or judgment sampling. However, the results may have limitations in generalizability to the broader population due to sampling bias. It's crucial to acknowledge and communicate these limitations when drawing conclusions. Despite potential biases, such analyses can still provide insights within the studied sample, offering valuable information for specific contexts. Careful interpretation and cautious extrapolation are essential when applying inferential statistics to findings derived from non-probability sampling techniques.
To paraphrase Lord (1953), the numbers do not know how they were created, to you can apply these statistical techniques providing you have have two variables with matched numbers. What inferences you make from the results will depend on how the numbers came about, your research questions, and your assumptions.
I see a lot of answers saying you can which by themselves are not wrong. However, more keen attention to effective representation is important when using inferential methodologies.
As indicated by others, inference to a population indicates some kind of 'representativeness.' Random sampling is used in survey sampling to hopefully obtain some degree of representation of a population, and in design-based sampling it is used as the foundation for inference. Regression models, however, are used in other areas of statistics, and the prediction approach has become popular for surveys also, in the last half century or so. But when regression models are used, you want to be reasonably sure that the model used applies to the entire population or subpopulation to which you are to infer.
I have used quasi-cutoff sampling (generally cutoff or near cutoff sampling by establishment, with multiple data items such that each item is not likely sampled in a strict cutoff sample, but each has its own size measure in the 'prediction' phase). In quasi-cutoff sampling, there can be substantial bias, but in my experience, this is a result of failure to realize that more than one model should have been used. That is, data were mixed which followed different models, and with a little knowledge of the data, one could have avoided the problem. It is like knowing any population well enough to stratify properly.
As for convenience or judgment sampling, work has been done to infer using them as well, but I think that in general, it will be less successful, as multiple covariates are likely needed, and the more covariates that are needed, the less likely, I believe, that you will find a combination that will do the job well, even if you have the covariate data to the extent needed, which is also unlikely. Work, nevertheless, has been done to model using these covariates, as well as to produce pseudo-weights for quasi-random sampling which makes use of your nonprobability sample. That also likely requires multiple covariates. For example, see Valliant, R, Dever, JA, and Kreuter, F(2018), Practical Tools for Designing and Weighting Survey Samples, 2nd ed, Springer, Chapter 18, “Nonprobability Sampling,” Section 18.4, “Approaches to Inference.” They discuss the model-based approach, quasi-random sampling, and a combination approach.
Note that in statistical learning, and elsewhere, they note that too many variables can lead to increased variance, and too few can cause bias. (Refer to the "Bias-Variance Tradeoff.") We do not want to overfit or underfit when modeling. But, as I noted, I think that if you need a number of covariates, that is problematic, both in theoretically obtaining a good model, and in the practical sense of obtaining good data. I have found, studying heteroscedasticity, that expected heteroscedasticity may not occur when the model is more complex. However, if you oversimplify the model, since bias can be the result, you have to avoid that. Bias is also the result if you try to place the whole population, or a larger subpopulation than warranted under one model which does not describe all of that population or subpopulation. This is why model assessment is important in statistical learning, regarding data not used to form the model. (See Chapter 7, "Model Assessment and Selection," The Elements of Statistical Learning, Hastie, Tibshirani, and Friedman, 2nd ed, Springer, 2009, corrected 2013.)
So as others have noted, regression can be used without random sampling, but you have to beware of inference to a population which may not have been represented well by your sample. Random sampling is not a requirement for the prediction approach, but it may be helpful. The balanced sample approach based on the known predictors for the population may provide the representativeness that random sampling only hopes to provide, and an appropriate model for a group of data can do well without random sampling. I made extensive use of quasi-cutoff sampling where the small unsampled cases were known, by past census data, and other testing, not to substantially depart from the model, and later annual census data could be used to check totals from monthly samples. This worked very well. However, convenience and other nonprobability sampling may present a much bigger challenge. Bias is the problem to be considered.
Note that above I said that "...when regression models are used, you want to be reasonably sure that the model used applies to the entire population or subpopulation to which you are to infer." That is the prediction approach version of 'representativeness.'
However, since a model will not be exactly correct, that will provide a form of bias. (George E.P. Box: "All models are wrong. But some are useful.") You can fit a model where the expected sum of the residuals is zero, so it is "model-unbiased" (see Cochran(1977), Sampling Techniques, 3rd ed, Wiley, page 158) and it still won't be exactly right, though as Cochran says on page 160, when a ratio estimator looks appropriate, it "...will be hard to beat." This can particularly be true when the predictor variable is the same variable from a previous census, as indicated in Cochran(1953), Sampling Techniques, 1st ed, Wiley, pages 205-206, section 9.9, "Measures of the size of a unit." In such ratio model cases, where scatterplots confirm the behavior of the data from one census to another, and other testing is used, bias will not be a real issue, and cutoff or quasi-cutoff sampling for establishment surveys will greatly reduce variance. However, for cases of convenience and other nonprobability sampling obtained in most social surveys today, more covariates may be needed, a good model hard to construct, covariate data hard to obtain, testing may not be feasible, and thus bias is liable to be a much, much bigger issue with unknown ramifications.
Although there are some good answers about what you should consider, if you want to make inferences about the target population, there is something that has not been mentioned explicitly, yet (or I missed it, then sorry).
Correlation and regression are not per se inferential techniques. The correlation coefficient or the regression coefficient are only descriptive statitics and can always be used and calculated, since they are only sample parameters (if it makes sense to use a specific coefficient depends on other assumptions than for the inferential part, e.g. Pearson correlation is only suitable for linear relationships). If you want to make inferences about the target population, you need the inferential part, which has other assumptions, e.g. about the distribution of your parameter of interest, to calculate appropriate confidence intervals or similar things.
The inferential part is correct, if you have a random sample from the target population (and the other asumptions are met). If not, the inferential part cannot guarantee to have the correct result. This depends on other considerations described by others above, e.g. to avoid or consider bias.
There are two ways to make an inference from a sample to a population:
1) the probability-of-selection-based/randomized or design-based approach, and 2) the prediction-based/model-based approach. There are also combinations, in a way.
For number 2, consider
Chambers, R, and Clark, R(2012), An Introduction to Model-Based Survey Sampling with Applications, Oxford Statistical Science Series
and
Valliant, R, Dorfman, A.H., and Royall, R.M.(2000), Finite Population Sampling and Inference: A Prediction Approach, Wiley Series in Probability and Statistics
Inferential statistics is a statical method that helps researchers to draw conclusion and make predictions based on their data.
Application of inferential statistics can enable a researcher to make estimate about polutations and testing hypotheses to draw clonclusion about populations.
Yes, inferential statistics such as correlation and regression can still be used with non-probability sampling techniques like convenience or judgment sampling. However, it is important to note that the results may not be as generalizable to the larger population as they would be with a probability sampling technique.
Non-probability sampling methods can introduce bias and may not accurately represent the population of interest. Therefore, caution should be taken when interpreting the results of inferential statistics with non-probability samples. It is also important to consider the limitations of the sampling method when drawing conclusions from the data.