let us suppose we have observations on 500 people for their blood group as A, B, AB and O. How to analyze if their proportions vary significantly. I want to do it with STATA.
If you just have one group of persons, with no expected proportions nor a comparative group I fear you may not be able to make any test (as there is no hypothesis to test).
That is what I want to ask. I just a list of 500 persons with their blood groups only. My null hypothesis is that the blood group does not vary. How can test this by using STATA.
If I understand your question, I can recommend you use the command tabi.
For simplicity, I will try to answer you with a “real” example.
You can download the Excel spreadsheet “Example” attached below, to have exactly the same thing. I simulated a distribution of different blood groups in a sample of 500 persons using realistic proportions.
1° First, you have to import this dataset in Stata (I guess you know how). I choose to display all variable names in lower case.
2° Then, you have to calculate the proportions of the different blood groups in your sample, by creating a one-way table using the command tabulate:
. tabulate bloodgroup
3° Now that you have the distribution of different blood groups in your sample (196 20 46 238 for A, AB, B and O, respectively) you may test your hypothesis, which is that the blood group does not vary. So, we can consider that your hypothesis is that each blood group should represent 25 % (so, 125 subjects). It means that you want to see if your distribution (196, 20, 46, 238) is different from the “theoretical” distribution of (125, 125, 125, 125).
P.S. This is not really biologically meaningful. In any population, the proportions of different blood group are very different. What could be meaningful, would be to compare the distribution in your sample to a known distribution from the population or from another group. But as the procedure is the same, we can proceed as described below.
You have to crate that table manually, using the command tabi as below (I chose to show the proportion/percentage of each blood group, and to use the Pearson’s chi-squared test to test the null hypothesis):
. tabi 196 125\20 125\46 125\238 125, chi2 column
Now you have a table showing proportions of different blood groups in your sample, and comparing it to a sample where each group would be equally distributed and represent 25 %.
The p-value here is < 0.0001, so the blood group in not equally distributed in your sample.
Maybe there was a problem while importing the dataset. Can you check if all options were correct. Make sure you checked the option "Import first row as variable names" and you selected "lower" for variable case. (see arrows on the picture attached below).
It it doesn't work, try to use your own dataset, following the same procedure