Sample size for so many variables survey (questionnaire)?

There are two methods for multivariable allocation and multivariable sampling.

Efficient Balanced Sampling: The Cube Method by Deville and Tille

http://www.jstor.org/stable/20441151 There is R-software for this. Does not allow direct variance estimation. Uses Martingales. At one point I was familiar with their method. Creecy and Klein used some of the software in a controlled rounding attempt for one of their applications (generating synthetic data that rounded to Census published tables - they could generate the synthetic data but not get it to round - my methods (below) give some insight into the difficulty of the problem which I was able to overcome even when it is known that no general controlled rounding algorithms in 3+ dimensions can exist).

Tille also has an excellent book Sampling Algorithms that a number of us have been through. The book gives an exceptionally strong presentation on several algorithms.

http://www.census.gov/srd/papers/pdf/rrs2009-08.pdf Allows direct variance estimation. Also gives methods for controlled rounding (among the four components of the method that I cover). Extends the Cox, Causey, Ernst (JASA 1985) from two dimensions to three or more. Controlled rounding software is exceptionally difficult to write.

Anil P Gore

It is useful/necessary to prioritize. i would decide the sample size based on the most important aspect of the survey. I would also keep in mind what I can afford. If there is still margin left, after taking care of the key objective, then i will consider the second priority and see what is the sample size needed and increase it if necessary.

William E. Winkler

There are two methods for multivariable allocation and multivariable sampling.

Efficient Balanced Sampling: The Cube Method by Deville and Tille

Tille also has an excellent book Sampling Algorithms that a number of us have been through. The book gives an exceptionally strong presentation on several algorithms.

Ronán Michael Conroy

You would be well advised to do sample sizes for a range of scenarios. What variables have the highest and lowest prevalences? What population subgroups have the lowest prevalences? What are the questions that could produce comparisons in which there would be small numbers of participants in a cell of a table?

These will give you some idea of the analytic potential of a particular sample size. In the end, though, your sample will have one size, so it won't be ideal for everything. You can use these scenarios to figure out what questions aren't worth asking or what questions are beyond analysis because of low frequencies expected.

William E. Winkler

If you wish to control the sample size for a single variable, then stratification can help. If you want to do three or more variables at once then you need methods that can produce efficient stratification for several variables at once and that has theory that allows minimizing the sample size for several variables. The methods that I listed can do that.

James R Knaub

Maen -

Anil's approach is not bad, regardless of what probability of selection/design-based, prediction/model-based, or model-assisted design-based methodology you use, and even good guidance for qualitative studies as well. Bill gives more rigorous options, the first paper using auxiliary data, and the second a multi-attribute Neyman allocation, which requires more expertise.

I used model-based sampling and prediction for small sample sizes from many small populations, for official statistics, and each attribute had its own model weights, so there was not such a problem as having a sample selection probability that is good for the estimator for one data item and terrible for another.

Regardless, you need some preliminary idea as to standard deviations. Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons, is one of a number of good sampling books, and in your case the recommendation for obtaining 'estimates' for these sigmas might be a pilot study.

A pilot study has the advantage of helping you to work out details before you fully commit yourself, and might be helpful here, especially if you have a fairly extensive project.

One more reference you might want to see if you have auxiliary date to guide a probability of selection-based study:

Holmberg, A.(2007), “Using Unequal Probability Sampling in Business

Surveys to Limit Anticipated Variances of Regression Estimators,”

Proceedings of the Third International Conference on Establishment

Surveys (Montreal, Quebec, Canada), American Statistical Association,

pp. 550-556.

http://www.amstat.org/meetings/ices/2007/proceedings/ICES2007-000162.PDF

Of course you need to consider the kind of data you are collecting. One trap I have seen is that many people run across sample size 'calculators' on the internet, which are not clearly marked that they only apply to yes/no data, usually assume the worst proportion in lieu of estimating sigma, and generally ignore the finite population correction (fpc) factor, so that they may even recommend a sample size that is larger than your population! If you don't use something relevant to your data for one question/attribute, you cannot get anywhere with more than one question.

The kind of data you are collecting, and your goals are important. You might want to ask another question and include more specifics. (You mentioned "Tohmas thempson." Maybe that was a hint at your application, but I don't understand. Did you mean "Horvitz-Thompson?")

If you are looking at likert-scale data, as many on ResearchGate seem to do, then there may be some texts specifically on them, but not my area - I don't know any.

Cheers - Jim

Giulio Barcaroli

I would like to add to the list of suitable methods suggested by William Winkler also the one implemented in "SamplingStrata" R package.

It is a method that allows to optimise the stratification of a given population frame, together with the allocation of units in the resulting strata, given precisione constraints set on a number of target variables. It is also possible to differentiate precision constraints in different domains of interest.

You can find it on the CRAN:

https://cran.r-project.org/web/packages/SamplingStrata/index.html

The method and the software is described in the paper:

Giulio Barcaroli (2014). SamplingStrata: An R Package for the Optimization of Stratified Sampling. Journal of Statistical Software, 61(4), 1-24. URL http://www.jstatsoft.org/v61/i04/.

James R Knaub

Above I noted the following:

"I used model-based sampling and prediction for small sample sizes from many small populations, for official statistics, and each attribute had its own model weights, so there was not such a problem as having a sample selection probability that is good for the estimator for one data item and terrible for another."

This relates, with regard to ratio estimation, to something I recently noted in a poster presentation on ratio estimation:

If you use efficient probability-of-selection (design-based) sampling for ratio estimation, then the 'optimal' pi-values (probabilities of selection) would only be based on one y (one question on the survey). Such a set of pi values may be quite bad for other survey questions. (Note how this relates to the comical, but instructive story: "Basu's Elephant Fable" from the 1970s.) However, if one uses the model-based (prediction) approach, then the predictions are based on the size measure (using x-values or for multiple regression a combination of regressors such as the predicted-y values) which is appropriate for each individual question, y. The Holmberg(2007) paper I mentioned above may give you a good compromise set of pi-values, and you may use an adjusted design-based ratio estimator (noted, for example, in Cochran(1977), Sampling Techniques, I think), but the model-based option seems preferable to me. It is simple and easy to interpret, and often very efficient, with overall low uncertainty compared to the design-based approach. (See the appendix to https://www.researchgate.net/publication/317914104_Handout_Bibliography_for_Comparison_of_Model-Based_to_Design-Based_Ratio_Estimators_Poster.)

Data Handout Bibliography for "Comparison of Model-Based to Desig...

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why does my protein refolded to beta sheet during thermal denaturation analysis?