i am working on a set of data and I am confused with the use of Chi square table. I get two values in R, one is chi square value and another is p value. Which one should I compare with the table? I have gone through a lot of notes on the web but each of them say a different thing. At 0.05 significance level which value should I compare and then what is the role of the other value in that regard? For e.g. If chi square value is to be tallied with the table value at 0.05 level of significance and the table value is less, then the result is significant or not, and then what is the use of the p-value written besides it.
Guidance will be highly appreciated.
.
the p-value is just the probability that, under the null hypothesis H0, the chi square value (Chi2) will be greater than the empirical value of your data (Chi2Data)
p-value = Prob(Chi2 > Chi2Data | H0)
.
the following wikipedia entry might help you clarify the notion of p-value
https://en.wikipedia.org/wiki/P-value
.
in the old days, performing a Chi2 test would mean
nowadays, R does steps 1 and 2 for you (although the choice of the number of degrees of freedom may sometimes be a little tricky) ... you just have to compare the p-value to your significance level to conclude in step 3
.
http://www.nature.com/nmeth/journal/v12/n3/full/nmeth.3288.html
We really need to know what you're doing the test for! It could well be that the magic number of 0.05 is not the one you need. Could I suggest you look at the website:
http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1476-5381/homepage/statistical_reporting.htm
(amke sure you get that all on one line, and then look at the article titled "Data interpretation - using probability".
That could hep!
Hi Zaara,
Chi Square is goodness of fit of your model and p value is the significance value of your tests. for example, in hypothesis test your results support your hypothesis at .05 significance ( p=.05) which means that you are 95 % percent confident that your results will repeat i.e, Hyp will be supported but still there are 5% chances that you will get results.
Your question may be best answered if you give some data from your experiment or study.
Best Regards
Rafique
No there is NOT a 95% confidence that the results will repeat!
The p value is the likelihood that YOUR results support the hypothesis that the samples you are comparing could have come from the same population. With small samples, their reliability ( the extent to which they truly represent the source population) is limited. You cannot use a P value, on its own, to interpret the results of a study. With small samples, P values can vary widely. If you were to repeat a study with small samples, you can get a substantially different result. Please look at:
Nature Methods (2015) vol 12 issue 3 p 179.
The CHi square test is a test that the two populations have a similar distribution of a categorical variable.
@ Gordon dear sir, thank you very much for directing me to this article.
But majority of the books on statistics and academia interpret the results like this!
should we consider true them too?
thank you very much once again.
You're absolutely right, most statistics textbooks take a "classical" (Neyman-Pearson) approach which is frequently inappropriate and there are far better ways of doing things. Basic scientists are not statisticians and just go on doing the same old same old. It's the same way that economics classes use a model of economics that take no account of the fallibility of human behaviour and failed to predict the banking crash. Also, many textbooks contain errors. They are not as well reviewed as journals! Look at
Int J Epidemiol (1988) vol17 (2) p245 which is a simple short article.
As for the basics of testing, Brit J Clin Phar (1982) vol 14 p235
You will see for the dates of these articles that this problem has been recognised for a long time - and it's still with us!
So to answer your question: do not consider books to be true. Life evolves.
Hello there,
P value is telling you if there is a significant difference between the groups regarding a number of categories. Unfortunately If there are more than 2 categories you are not informed where exactly that significant (if p value is below .05) difference lays...
Best wishes
The P value tells you the likelihood of finding your data IF the null hypothesis was true: if you have several samples, then there are several ways these samples be unlikely. The P value is not a mreasure of significance until you have decided (often arbirarily) to accept a certain value to be so unlikely as to be "significant". So P and significance are only related when the experimenter decides. The P value is a probability: you are not told by the P value, you have to decide what it signifies!
Hello everyone
I think giving examples is the best way to explain things.
Suppose we compare the mean weight of people in a remote village to that of the country population. Let's say that the results of the one sample T-test had a p-value 0.03. This means that the chance of getting such mean weight value in the remote village is 3%.
In other words, if we concluded that the population mean weight of the remote village is different from that of the country population (we can say that as the p-value is
I am afraid that the P value doesn't indicate the chance of being wrong. It indicates the possibility, if the two samples we are comparing have in fact been drawn from the same population, could have been more extreme than the samples that we had. This is NOT the same thing as the chance of being wrong. Think for the moment about power (which IS to do with the likelihood of a false positive or a false negative conclusion). This is related not to the P value but the sample sizes. With small samples, if we get a P value < 0.05, then a repeat study is very likely to gie a P > 0.05. I suggest you look at Nature Methods vol 12(3)179-185 March 2015 I shall try to attach a copy.
The P value is overused, overvalued, and hard to interpret when considered alone.!
Then what is df value besides the p-value displayed at the output in R?
df is the degrees of freedom. Depending on how the data are arranged, it helps determine which probablility distrinution is used. It helpd be sure that a statistical analysis has been done correctly. It vaies according to the "dimensions" of the data. The simplest way to explain is to think of a mean: if you have a mean value, then the degrees of freedom of the date that are used to make up the mean will be one less than the total number of values summarised by the mean. The last value isn't "free" because it has to have a magnitude that results in the final mean.
I've looked back to the start of this thread. Just to give another example of how to explain "degrees of freedom" why not think of a simple 2 x 2 Chi squared test. When we do this test we should write out the number of observations in each of four classes: good result from treatment 1 (a), good result from treatment 2 (b), bad result from treatment 1 (c), and bad result from treatment 2(d).
treatment 1 2 margin
good a b a+b
bad c d c+d
margin a+c b+d a+b+c+d
The marginal values are "fixed" in that they represent numbers that are either determined before the results: we choose the total a+b+c+d, we choose the numbers that get greatment 1 and 2, or they are fixed by our hypothesis: we usually make the null hypothesis, and expect that the good and bad outcomes are going to be the same in the two treatment groups, so that the overall outcome for the two groupd will be the same as the marginal outcomes, good outcomes would be seen in proportion ( a+b) / (a+b+c+d).
SO: it should be clear if if these marginal values are fixed, as soon as you define one of the values in the table, a, or b, or c, or d: let's say you fill in the value for a; then the other values are fixed as well. Here, we have only ONE degree of freedom.
Dear Gordon Blair Drummond Sir
Please guide me what to do when p-value in chi square test come more than 0.05. How to treat such a result ? My sample size is 400.
Chi square and P-values are 2 different test. (Although they may measure something very similar).
If you are measuring the significance of your result (Such as p value
Both these values (Chi square and P value) are used when we test the same theoretical "hypothesis". We postulate that there is NO difference between where the samples have been taken from. (There will always be a difference between samples, because the random drawing of samples from a population (a large number of individuals) will most likely give a different set of values. But these two, slightly different samples, could still have been drawn from the same large collection of individuals ( the "population". For the Chi square test, the samples are categorical: they are one thing or another: the simplest form is the 2 x 2 test but that isn't the only form of test: it could use several categories, not related, like language spoken, hair colour, and gender, in relation to a finding such as passing or failing an exam. The test we do on categories like this gives a Chi squared value, which is an estimate of goodness of fit of one category in relation to others. It is calculated from the observed frequency of a feature (hair colour, for example) in relation to the expected frequency (how many people is the entire experimental group have red hair, for example). If the null hypothesis were true, the samples would only differ because of random variation. The Chi squared test can estimate how much random effects plays in the results: and from that we can get a P value: the probability that the data you have observed could have been found IF the samples were drawn from the same population. NOTE, as I have said in this trail before: this is not the probabilty that the samples are different: it's the probability that the samples could be from the SAME population. The P value is merely the probability that the samples have come from the same population. If the value is small the probability is small: and then we are less inclined to accept the null hypothesis. and thus we are more likely to conclude that the samples have come from DIFFERENT populations: but we would have to do more tests to find out how different the polulations are likely to be.
So to get back to the question you ask: many statistical tests use the same idea: could these samples be different, but turn it round to say "how likely is it that these samples are "the same" i.e. from the same underlying population (random handfuls taken from the same bag of barley grains).
So lots of tests (Chi squared test, Student T test, Analysis of variance: all give you a P value: and that goes some way to saying " these samples are unlikely to be drawn from the same single, defined, population". If you accept that, then the fun starts: is there a BIG difference -how can we be sure - and importantly, if the same study were repeated, would we get the same result? Usually, the answer is no, because studies are usually under-powered.
I have recently read a very clearly explained book called "Starting out in Statistics" by P de Winter and P Cahusac. Published by Wiley Blackwell, ISBN is 978-1-118-38401-5 I thinks it's a good place to start.
Hope this helps
Gordon Blair Drummond, sir, I was working on finding the effect of peers on students English learning . Using Chi Squared, I found df, Chi Squared value and p value. As P value is used to determine the significance relationship between variables, then what the purpose of remaining values i.e., df and chi square value?
When we look at the numbers in a Chi sq test, there are some that are fixed, and some that could possibly be variable. This leads to the idea of "degrees of freedom". So let's say you have 82 students in your study - that has been fixed by external factors (time, that's all there are, all you have the money to recruit - lots of things that aren't relevant to the possibilities you may be testing). Another factor that MAY be fixed is the number of students who have english-speaking peers. You can't alter that in your experiment (Some experiments may allow this, and the statistical considerations change a bit if you can manipulate this factor). Let's say you have 40 students with english-speaking peers so you will have 42 who don't. Finally lets say you categorise the students into those who learn english well (good) and those who don't (bad). You find that overall, in the 82 students, there are 62 that are good and 20 that are bad. These numbers define your degrees of freedom: we call them the marginal values, they are the sums of the cells in a 2 by 2 plot. What you do when you use the test, is consider all the possible arrangements of numbers that can fill the inner cells of this 2 by 2 plot:
PEERS NOPEERS
GOOD a b 62
BAD c d 20
40 42 82
where, for example, a + c must be 40, and a+ b must be 62.
NOW: you hypothesise that their peers have NO effect of how well the students learn english. If this were so, one possibility (the most likely) would be that the same proportion of good students (62/82) would be found in the peers and the no peers columns. With the numbers I have suggested, then there might be 31 in each column, or maybe 30 in the peers column, and 32 in the no peers column. Now you can see that the values c and d are fixed: the "degrees of freedom" indicates how many possibilities there are, once you've fixed the marginal values.
So the table would have to look like this (if there were NO effect):
PEERS NOPEERS
GOOD 30 32 62
BAD 10 10 20
40 42 82
You can look at this and see that if having peers had an effect, there could be more extreme values of a, b, c, and d that could be fitted into the table. The sums, a+b, c+d, will change: thus one possibility would be:
PEERS NOPEERS
GOOD 31 31 63
BAD 9 11 19
40 42 82
Chance variation here could be quite large: 82 subjects, and an effect that is in the middle range. If you get a low P value despite a large df, then you can be more confident that the null hypothesis is not tenable. PLEASE note that the test doesn't show that there IS a relationship: only that the null hypothesis is unlikely to give the observed data. The Chi value is how we calculate this unlikelyliness, using the numers in the 2 x 2 table and this is converted, using the df, into the P value which is conveniently scaled from 0 (utterly unlikely) to (absolutely likely). So the P value has to be "interpreted" with the help of the df. The chi squared value is the actual result of the test: If everyone had a table of these values, and knew the df, they could work out the P value for themselves.
Most of the tests we use have some sort of hypothesis to test. In fact there is a substantial logical difference between the commonly used T test: two sets of measurements such as height, taken from two groups: there, the hypothesis is that the groups represent individuals that could have come from the same population: the NULL hypothesis. The interpretation has to include the reminder of this, so we say: "IF the null hypothesis WERE true, then the likelihood of observing the height values that we have in these two sets of measurements, or values that differ even more extremely, is 5 in 1000 (which is the same as P of 0.005). The conclusion would usually be , thats quite unlikely. The P doesn't tell you anything more! It certainly should NOT be used to draw conclusions about the possibility that the two groups are NOT drawn from the same population, because the PREMISE of the test is that they ARE drawn from the same population. Starting to induce possibilities such as false negatives ( the samples really are different, but we couldn't show the difference) is an invalid step away from the baseline hypothesis: the samples ARE from the same population.
BUT we use a different logical process when we do a Chi squared test, which is a test of association. Here we have a group of individuals (they may have been drawn from a larger population, but that is not central to the test theory). Let's say we have a cage of rats. In addition to normal rats, who have tails and ears, some rats have no tails, and some have no ears, and some sad rats have no tails or ears. Here are the data:
Ears No Ears
Tails 86 15
No tails 14 11
The question the Chi squared test answers is "how much association is there between ears and tails? and the P value is 0.0055. You can look at these associations any way you like. This is done by considering one characteristic at a time. If we look at tails: out of all the 126 rats, (86 + 15 + 14 + 11), we can see that 101 (86 + 15) have tails, about 80%. If we look at those who have ears, then we can see 100 have ears and 26 have no ears, again about 80%. These are the proportions in the whole cage of rats, which we get by summing all the individuals. If these was no association between having ears and having a tail, then this 80:20 proportion should be present overall. It isn't : In rats with tails (top row) 86 out of 101 rats have ears, but in the bottom row, the rats without tails, 14 have ears and 11 have no ears which means only 56% of the rats without tails have still got their ears! This this size of sample, the result is quite robust: there is NOT a random distribution or tails and ears in the population, the link between having a tail and having ears is quite strong. A smaller sample, would be more subject to a random effect(if you had 5 pairs of gloves, and 3 were black and 2 were white, your random pick of two might give you 2 white gloves that were for the left hand). There are some other factors to consider: some studies FIX the totals in some way (for example you may buy 50 black rats and 50 white rats, so you've already fixed one degree of freedom: you can even have studies in which both factrs are fixed.) With these varaitions, the statistical model varies. The best example is Fisher's exact test.
NOTE: these tests of association differ in an important way from other tests like the T test: these is NOT a Null hypothesis, here there is just the question of "likelihood" , and the logic is "Bayesian".
I have a data set where I compare values like Hba1c, statin treatment, blood pressure etc. among men and women. The data are taken from the same sample. When I perform the chi test, the values for some of the mentioned parameteers show no or minimal no difference, but the p-values is >0.05. That means the H0 of no difference is denied. How come? I don't know what others test to do to get test if the p-value is correct. The samples are above 50 so can't use fisher's exact test.
Are you using the < and > signs corectly?
If P > 0.05, then the probability that the data could have come from the same population (in this case, the men and the women are considered to be the same population) this means that the probability is MORE than 5%. If you write X > 0.05, this means X is greater than 0.05.
Thinking about your result: you have to assume that there is NO difference between men and women. IF there were no difference, the likelihood of getting the data you have observed is more than 0.05, let's say 0.1. Most times we accept that this is quite a large possibility, so we cannot discard the possibility that the null hypothesis is correct.
There's another problem here. You can't really use the Chi squared test for values such as blood pressure, unless you categorise the blood pressue (normal, high). Chi squared test is used for categorical data (male and female is fine, dead and alive, on statins or not on statins).
Why not post a simple example: men not on statins, men on statins, women not on statins, and women on statins, so you can make the simplest 2 x 2 table. Then it's easy to do a Chi sq to check. Actually you can do Fisher's on any sample size, but it's actually a slightly different theory behind the test, I would prefer the Chi sq.
Dear Gordon
Thank for taking time and answering my questions.
I am sorry, my mistake, by blood pressure I meant "having hypertension" or "not having". I know that the blood pressure itself is testet either with a t-test ( if normally distributed or with a Mann W.
Here you get my 4 specific examples that confuse me:
I compare males and females having or not having COPD.
All 30 observations (15% have COPD)
Male 20 (17% have COPD)
Female 10 (12% have COPD)
the p-value comes out 0.42 if I do Fisher and 0.36 with chi test.
HgA1c values as: comparison of medians: 205 observations
Median value for whole population 39 (37-43)
Male: 39 (37-43)
Female: 39 (37-43)
p-value 0.70
How come they all are nearly the same and the p-value is so high, which means there is a difference.
Diabetes in the population: 205 observations
All 32 (16% have DM)
Male 17 (14% have DM)
Female 15 (18 % have DM)
P-value of 0.56 Fisher's test
Value of eGFR: 205 observations
All 74 (+/- 15)
73 Male (+/-15)
74 Female (+/-15 )
ttest gives a p-value of 0.60
Thank you in advance for your guidance
A large P value shows that the observations do NOT support a difference.
For example:
Men Women Total
COPD 3 1 4
No COPD 17 9 26
Summed values 20 10 30 (total patients studied)
if there were no difference between men and women we would expect an incidence of COPD of 4 out of 30, that is about 13%.
That's pretty close to 3 out of 20 and 1 out of10, for the men and women respectively. This means the null hypothesis (no difference between men and women) cannot be considered to be incorrect.
You have only small numbers. But consider an extreme example: only women have COPD:
Men Women Total
COPD 0 4 4
No COPD 17 9 26
Summed values 17 13 30 (total patients studied)
A Chisq test now gives P = 0.026. This means that the null hypothesis (the incidence of COPD is not affected by gender) is unlikely, you'd only get this result 26 times if you repeated your observations 1000 times.
You must understand the basis of the test: you're assessing how tenable the null hypothesis is. In most of the numbers, it is: partly because the numbers are small.
Okay. Thank you for your answer, exactly I also thought that a large p-value rejects our H0 of no difference, which means that there is a difference. but when you look at the numbers, the do NOT look different. Okay I will read your answer again and try to understand.
Thank you once again for your help
Even I am confused with the interpretation of Chi-Square Test. Plz le me know what and how to write the results shown in table. A simple way to explain. Thanks in advance
If you've looked at the answers above, and you're still confused, what I think you are having trouble with is the way you determine "significance".
The idea is that asking "is this significant" is really asking "is this anything other than a random finding that could have occurred by chance". For example, the simplest random device is flipping a coin: each flip has a 50% chance of being one way or the other way. So we would say p = 0.5. Someone spent a very long time flipping a coin to show that, although the p value was not exactly 0.5, there were perhaps 5milion and one heads and 5 million and three tails!
Why do we want to assess the likelihood that only chance is at play - because usually we've interfered with something and we want to know if what we have done has had an effect. If we assess the p for the dice we take to a Casino, trying to win something, we make a loaded dice that usually comes up with a 5. When we test this, we find that the probability of getting a score of 5 is not 1/6, but 2.5/6: we have got an effect from adjusting the centre of gravity of the dice!
If we throw the dice enough times, we can be certain that we'll make money when we play with this dice: but we have to trow the dice more than a few times, to be sure. That's why the number of observations is important.
Now let's look at the Chi Square test (a lot of this is in the thread above, and I am repeating some of it here)
Let's look at a classroom of children, and test them to see if they have antibodies to Covid. There are 34 children and 16 have covid antibodies. That means there are 18 that don't have antibodes.
These would be one of the margins of the Chi squared table. These are not fixed beforehand: we coud have had a different number in the class, and factors outside our control have affected the antibodies.
Now we test the hypothesis that the ones with antibodies are the older ones: we divide the class into the older ones and the younger ones (this is a factor we CAN control, which sometimes makes the logic more tricky (fisher's exact test was famously done with fixed factors, tasting tea that had the milk in the cup first, or second, in an exact 50/50 ratio)
So now we have 17 older, 17 younger, and 34 in total, the other margins of the table.
I now tell you that in the younger group, 8 have antibodies. You now have enough information to fill in the table! Do this step for yourself. Write the nubers down in a box! What it tells us is that it doesn't look like age has anything to do with antibodies, because the proportion 8 out of 17 is exactly the same as the overall proportion of 16 out of 34. It is this proportion of 16 out of 34 that allows us to "expect" a ratio of 8 out of 17.
What would we consider an unexpected result? Well, one where the distribution of antibodies was different from the overall expected ration: say 10 children in the younger group have antibodies, and only 7 do not. Note that given the results for the whole group (16 with antibodies) then in the older children there can only be 6 with antibodies. Now the distribution of antibodies, in the two age groups, are NOT what we would expect. The Chi squared test just works out how unexpected the numbers are, by calculating the difference between what's observed and what would be expected. The bigger the number we get, the more unexpected the distribution. In words, it's the sum of all the (Observed -expected)squared/expected number values (hard to write in this text box because the symbols are not available!).
But you'd start saying that in the young children with antibodies, the number was 10 and you expected 8, so the difference is 2, the squares of that is 4, and the result is that 4, divided by the expected number 8, which comes to 0.5. Do the same for the other 3 boxes. Add the results, that's your Chi squared. For the example I have given the difference is not so unexpected: Chi sq is 1.889 and the probability (p, which I got from a lookup table, WITH 1 DEGREE OF FREEDOM) is 0.169: which is about one in six. So IF there was no influence of age on the prevalence of antibodies (that's the null hypothesis), you would get this result about one of six classrooms of children.
If we change the resuts: say 12 of the younger kids have antibodies, rather than the expected 8, now we have a box that contains 12 young kids with, and 5 without, so there will only be 4 older kids with and 13 without.
Now we get Chi sq of 7.55 and that gives a probability of 0.006. That's pretty small: we might conclude that younger children are more likely to have antibodies.
I hope that this explanatio has been aimed at the question you asked!