I am going to perform household survey. I will have the list of houses from my survey area. One house can occupied by more than one households. How can the household be chosen randomly?
It is not truly possible to randomly select since there is finite number of houses and the choice of the first one will affect the next choice and so on until only one is left, which has no choice but to be then chosen. But you can use the following:
Number each member of houses 1 to N.
Determine the population size and the sample size.
Select a starting point on a random number table. The best way to do this is to close your eyes and point randomly onto the page. Whichever number your finger is touching is the number you start with.
Choose a direction in which to read (up to down, left to right, or right to left).
Select the first n numbers (however many numbers are in your sample) whose last X digits are between 0 and N. For instance, if N is a 3 digit number, then X would be 3. Put another way, if your population contained 350 houses, you would use numbers from the table whose last 3 digits were between 0 and 350. If the number on the table was 23957, you would not use it because the last 3 digits (957) is greater than 350. You would skip this number and move to the next one. If the number is 84301, you would use it and you would select the person in the population who is assigned the number 301.
Continue this way through the table until you have selected your entire sample, whatever your N is. The numbers you selected then correspond to the numbers assigned to the members of your population, and those selected become your sample.
If the household size in a house is not a factor than computer based simple random sampling or lottary based random sampling is good. If the household size matters then stratified random sampling is more suitable.
I would like to ask first, do you have the list of population i.e. list of all households present. Theoretically, it is impossible to achieve randomization if you don't have the list of population. Then comes the biased and unbiased errors.
You are rightly said one house can occupy many households. Before the household survey you need to check the universe from where you can draw the sample. Then you can think of any sampling design. @Seema Khadka
By selecting universal characteristics first, you can easily streamline the possible inconsistency in selecting the data set, then adopting random sampling may not escalate possible biasedness in the data set. Priyakrushna Mohanty
The selection depends on the objective of your survey. First you have to have a clear definition of household. You said you have a list of house and a house can have many household. So based on that list you should elaborate a list of household that you will used for your selection.
You should weight the household depending on the number of peoples on different household and make you selection randomly to meet your sample size.
If you consider cluster sampling, each house could be a "cluster" of one or more households. You could use a randomized selection of houses, even simple random sampling, and then census the households in the randomly selected houses. Or as a two-stage sample design, you could use a randomized design, even simple random sampling, at that second stage, for the households within the selected houses.
HOWEVER, please note that even if you use simple random sampling of houses, and then either a census, or simple random sampling of households in the selected houses, you do NOT have a simple random sample overall. If you are collecting quantitative information, you may find what you need in a textbook or online to make estimates, infering from your one-stage cluster or your two-stage design to the population. However, if you just want to say you have a simple random sample, you cannot do that without a list of all households, from which you'd draw a simple random sample, and then see what houses you need to visit. You do not have such a list, so simple random sampling is not an option.
The point I'm trying to make is that although you can use a randomized design, for which there is a good deal of literature - see Cochran(1977), 3rd ed, Sampling Techniques, Wiley, for example - you will not have a design where each household has an equal chance of being selected. It may be close enough for your purposes, but it isn't the same thing as overall simple random sampling.
Will it be close enough if you are doing interviews and you just wanted every household to have an equal chance of selection? Maybe. If the number of houses is much much larger than the average number of households per house, that helps, and that is almost certainly not a problem at all. If the number of households per house does not vary very much, then that helps, especially if that number is low. (After all, one household per house would make overall simple random sampling possible.) I think that it is best to then take one household per house to be closer to simple random sampling. I think that is particularly true if the variance within clusters is relatively low.
I would choose households applying a rand function. Or you may go for a ratio. For example you may opt 3:1 or 5:1, means from every 3 or 5 households chose one. Suppose in a lane there are 50 households; leave the first three and opt the fourth one and then leave next three opt another. In this way you can reduce biased-ness. You can go for stratified sampling. If you divide your entire population(household) into different clusters and then go for random selection of household from each cluster.