CASE-1: KNOWN POPULATION SIZE: A common approach is the Yamane method. However, it is not the most efficient. The approach is simple and straight forward: (i) given population size; and (ii) set confidence interval from which 1 - CI = alpha. The minimum sample size calculation follows:
(1) nY = N / (1 + N(e2) where e = alpha level
Suppose the population is 50,000 and the confidence interval used is 95%, then e = 0.05. The minimum sample size is simply:
50,000 / (1 + 50,000(0.052) = 50,000 / 126 or nY = 396.82 counts. Under this method, the sample size is approximately 400 in most cases. Here you have two types of population (family and non-family firms), using equation (1) you may determined nY1 and nY2.
CASE-2: UNKNOWN POPULATION SIZE: For non-finite population, use:
(2) n = (Z2σ2) / E2
... where Z = critical value for Z at a specified confidence interval; σ = estimated population standard deviation; and E = standard error given by E = σ/sqrt(ntest). We are required to take a test sample. The test sample is the initial sample taken for purposes of obtaining descriptive and inferential statistics. This test sample may be obtained through random sampling. What size should the test sample be? n > 30 is a rule of thumb for most cases.
TWO POPULATIONS: For finite population size, follow CASE-1 above. For non-finite population size, use CASE-2 above to obtained n1 and n2, i.e. take two test samples.
SHOULD THESE SAMPLES (FAMILY OWNED & NON-FAMILY-OWNED) BE COMBINED? It is most likely that the required sample size for Firm1 and Firm2 would be of different sizes; if so, then test for their homogeneity to verify whether they could be combined or remain separated for purposes of two group studies. The following test statistics may be used to verify if they are significantly different:
If the difference is significant then the sample of two populations (family and non-family firms) must be sampled independently: n1 and n2. this step may seem redundant if n1 and n2 above will be taken independently anyways---however, it is better to be safe than to have to answer to the question latter: how did you prove that your two populations are different to justify taking two different samples? However, if the two populations are indeed non-homogeneous, then the samples must be taken independently as described in (2).
REFERENCES: For binomial distributed data, see Agresti article attached.
(1) Westland, J. Christopher (2010). "Lower bounds on sample size in structural equation modeling". Electron. Comm. Res. Appl. 9 (6): 476–487.
(2) Nunnally, J. C. (1967). "Psychometric Theory". McGraw-Hill, New York: 355.
(3) Yamane, Taro. 1967. Statistics: An Introductory Analysis, 2nd Ed., New York: Harper and Row.
For Tanzania, family firms and non-family firms form part of private firms which its population is obvious known,say X. The problem is how to get n1(family firm) and n2 (non-family) which are unknown from the known population...! Thank you