Hi, i´m researching the relationship between two qualitative variables and I want to check significant differences, but my data volume is approximately 4,750 people, so... What statistical test should I use?
César D. González, In your case, the Chi-square test of independence is the most suitable statistical method when examining potential disparities between two qualitative variables in a dataset. This test serves the purpose of determining whether a statistically significant connection exists between two categorical variables and is widely utilized in various fields, including the social sciences. To carry out a Chi-square test of independence, the initial step involves constructing a contingency table. This table exhibits the frequency of occurrences for each combination of values related to the two variables. Subsequently, a statistical software tool can be employed to compute the Chi-square statistic and its associated p-value. The p-value represents the likelihood of obtaining a Chi-square statistic equal to or more extreme than the observed value, assuming the null hypothesis to be true (which posits no association between the two variables). If the calculated p-value falls below the chosen significance level, typically set at 0.05, you have grounds to reject the null hypothesis and assert the presence of a statistically significant relationship between the two variables. I have attached some sample results for your understanding in which there is no relationship between the two categories and the results are visible. Without complete data, I may not be able to consider adding further comments. For discussion specific to this question, you can connect with me on my WA for further fruitful outcomes; https://wa.me/+923440907874
Having 5000 subjects isn't a problem for common statistical tests.
However, hypothesis tests become increasing powerful when the sample size is large. In practice, this means that you will often find a significant result even when the differences or the association is small in magnitude.
The solution here is to be sure to look at effect size, either as a standardized effect size statistic or a meaningful unstandardized measure (like the difference in proportions or means).
Finally, be sure to assess the practical importance of the results. Sometimes a significant hypothesis test and a nominally large effect size may not mean anything in the real world.
As Sal Mangiafico mentions, usually testing a hypothesis of the association being nil is of little value with a sample that size (but more information needed). Assuming you have several values for each variable, you might consider correspondence analysis or the RC models. I'll attach the notes I use on these. These are in draft. Don't know if these will be relevant because I don't know about your data/research questions.
I note that I was not specific in my question. I want to analyze various qualitative variables between two groups (men and women), the variables are ordinal (like age group, socioeconomic class) and nominal (Yes and no in the presence of variables such as relationship problems, presence of mental disorder, etc), i have aprox 35 variables. The objective is to determine the difference in sociodemographic variables and risk factors of people with suicidal behavior. I was think use the Chi square, but given the volume of variables and data, I think I need a better analysis to show the results. I have been looking at the possibility of using multiple correspondence analysis, I would like to know if you think it would be a good option.
So is your goal to create some summary variable or set of summary variables from the 35. If so, the package mirt (https://cran.r-project.org/web/packages/mirt/index.html) allows the variables to be of different types, However, with 35 I'd recommend some screening to decide what to include.
I'm not entirely clear what the goal is, but it sounds like you want conduct multiple to tests to see if each of your dependent variables are associated with your Sex variable.
It's fine as kind of initial screening to do this. This is often done in big correlation matrix for continuous variables. You can do the same kind of thing with what you have. \
What tests you use kind of depends on what's common in your field, and it you want to designate dependent and independent variables vs. if variable are just "correlated" or "associated".
Chi-square makes sense as for nominal-nominal pairs of variables. Be sure to report an effect size statistic like phi or Cramer's V. Or you can just use proportions --- whichever makes more sense for your audience and what you want to do.
For ordinal-nominal, either Cochran-Armitage or Wilcoxon-Mann-Whitney, and then probably rank biserial for the effect size.
But this step is usually just an initial step to let your readers know what correlated with what. After all, if socio-economic status is associated with Sex, and relationship problems is associated with Sex, does this tell you what you need to know ?
Thanks everyone, i will try to use all your recommendations to find the best way to make a good research.
Sal Mangiafico One last question, I believed that Chi square could not be used in such large samples, could it be adapted to the amount of data I have?
The truth is, while I was studying possibilities, the Chi square was my first option, so while studying how to use it, I found a web portal that referred to for Chi square analysis the maximum sample size of 500 and a book (I don't remember precisely which one) that referred to a maximum sample size. sample from 2000, the reason they mentioned was that in a very large sample size any small variation could be statistically significant.
Although in retrospect they were the only two sources (one unofficial) that mentioned this problem among all the literature I reviewed.
César D. González , yes, that's a concern I tried to address above. But I certainly wouldn't set an upper limit to the sample size at which a test is useful.
The main point is to not treat the p-value from a test as something magical. Rejecting the null hypothesis is usually not all that interesting by itself. You have to assess the size of the effect and the practical importance of the results.