Currently planning to work to find the prevalence of hydatidosis among domestic cattle in remote areas. The previous prevalence rate was unknown as no research was conducted yet in that area.
One of the problem faced by most of the researcher is that they do not have information on the prevalence of the disease which they want to study. In the year 1986, I got appointed as Senior Research Officer in Desert Medicine Research Center, Jodhpur, Rajasthan, India. The very first problem, I faced was to determine the sample size. The objective was to carry out a general health survey in the state and to determine the major health problems of the area. Not much information was available nor the disease whose prevalence to be taken as reference for determining the sample size was available. After deep thinking I decided that we will decide the sample size to study all the diseases which has the prevalence of 1%. The sample size so decided has to be valid for all other diseases which has the prevalence more than 1%. With this background, I come back to your problem. The steps you may follow are as follows:
1) The formula to be used for calculation of sample size is n = 4pq/L^2
Where P is the prevalence of the disease under study, q = 1 - p; L is the tolerable error in the estimation of the prevalence
2) Guess the prevalence to be 1% of hydatidosis among the domestic cattles.
3) Then, n = 4*(0.01)*(0.99)/(0.02)^2 = 9900; here I have assume L = 0.002 (that is 20% accuracy in estimation)
4) So, the sample size should be to study 9900 cattle for the study.
5) Note, if you change the prevalence to 2%, 3% or 5%, accordingly the sample size will reduce.
6) For, 2% prevalence, using the above formula, the sample size will be 4900.
7) I hope this will satisfy your immediate requirement. Still, if you have any query, you are free to ask.
8) Note, when the desired prevalence is low say 1% or 2%, going for less sample size then required will not give you optimum results.
You can try either use this online source (re: fast calculator) which is pretty feasy and friendly using or my explained SAS code example below, which you’d just need to vary the parameters of the power analyses to explore the consequences of varying assumptions. If I can be of any further assistance, please let me know.
title 'power for comparison of proportions [estimation of odds-tatio] between 2 cardiac AAb risk strata (total n=800)';
proc power;
logistic
alpha = 0.05
vardist('aab_pos') = binomial(0.08, 1)/*distrbution of a dichotomous variable for "high risk group, yes/no" , (proportion of people with positive test, n trials [do not change])*/