I have to analyze 300 questions from 52 respondents. for every question, I received 52 responses. I need help how to analyze the data to make a framework.
Please tell us more about the questions. For example, are there groups of related questions, or 300 separate and distinct questions? Are you interested in the proportion of questions with YES responses, or something else? What are your research questions? Thanks for clarifying.
Bruce Weaver Thanks for your response. well, all these are separate questions. For example, the question is. "Can the application of Mix Reality (Visualization) overcome the barrier (Difficult to convince the client)"? so for this specific question, I will get 52 responses. So, simply I use the method that has a greater percentage of yes or no. Or there is any other statistical method/technique that gives significant results?
A result can only be significant in relation to a particular statistical hypothesis. You don't seem to have one. I'd suggest an exploratory analysis. I'd start with some clustering to identify groups of questions with similar response patterns.
Since each of these questions are yes/no, I am assuming you want to know if you can say whether a majority of respondents say "yes" or a majority say "no." If that is the case and this is a random sample, you would be conducting a two-tailed test of hypothesis with the null hypothesis being Mean = .50 or the proportion of yeses is .50. The simplest test for this would be a sign test.Any basic statistics text has the simple process for running this test. The only assumption for this test is the random sampling so the results could be applied to the full population. The greatest disadvantage is probably the experiment-wise error rate since you will be conducting 300 such tests. I recommend that you run each test a a very small error-rate (such as .001) to maintain a reasonable experiment-wise error rate.
Choosing a proper statistical test is dictated by the purpose of analysis, the nature of data, and the measurement level of variables. That said, if you intend to test the hypothesis of whether the observed frequencies for the two categories (i.e., yes vs. no) differ significantly from an expected value, you could go for the binomial sign test for a single sample, which is a nonparametric test for categorical or nominal data. For germane insights, you might refer to chapter 9 of Sheskin’s (2011) book, which is fully cited below.
Sheskin, D. J. (2011). Handbook of parametric and nonparametric statistical procedures (5th ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9780429186196
For example, the question is. "Can the application of Mix Reality (Visualization) overcome the barrier (Difficult to convince the client)"?
Is each of the 300 questions about a different (unique) "barrier"? Or can there be multiple questions per barrier, asking if application of other methods or approaches (other than Mix Reality) can overcome that same barrier (difficult to convince client)?
Later, you wrote:
So, simply I use the method that has a greater percentage of yes or no.
That suggests you might be comparing different approaches to overcoming the same barrier(s). But if the different approaches have different costs, surely you would also factor in cost, not just which approach had the highest proportion of endorsements. Would you not?
It seems to me we still do not have enough information about what you are trying to do to offer good advice. Based on what you've said so far, I'm still not convinced that null hypothesis significance testing (NHST) is needed, for example.
Bruce Weaver, I will elaborate more. I have 10 applications/use of automation in construction technologies (BIM 4 applications, Mix realities 3 applications, and so on) and I have 30 barriers in total. There are 30 questions for each application. for example "Can the application of Mix Reality (Visualization) overcome the barrier (difficulty to convenience the client )". Another question is "Can the application of Mix Realty (Visualization) overcome the barrier(Lack of Suppliers)", and so on. For each question, there are 52 responses in the form of YES or NO answers. "MY QUESTION IS HOW TO ANALYZE THIS TYPE OF DATA AND HOW TO SELECT THE BARRIERS THAT CAN BE OVERCOME".
That helps a bit, but I'm still guessing (somewhat) as to what your data file looks like. Does it look like the CSV file I have attached? Or could it be restructured to look like that?
If it does look like that, are you trying to generate output that shows for each barrier the application(s) that have the highest proportion of YES responses? Something like this?
. list barrier app pYes if flag, clean noobs
barrier app pYes
1 7 .5192308
2 1 .6730769
3 2 .5192308
4 8 .5192308
4 10 .5192308
5 3 .5192308
6 4 .4807692
7 9 .5192308
8 7 .5
9 3 .5192308
10 9 .5384616
11 1 .4615385
12 6 .5192308
13 3 .5576923
14 2 .5576923
15 4 .5
16 7 .5192308
17 4 .5192308
18 1 .4423077
18 3 .4423077
19 2 .5192308
20 7 .5384616
21 8 .5
21 9 .5
22 2 .5192308
23 1 .4807692
24 2 .5384616
25 3 .5192308
25 6 .5192308
26 8 .5384616
27 6 .5384616
28 4 .5
29 7 .4615385
30 1 .5384616
Thanks for clarifying.
By the way, if that is what you are trying to do, here is the Stata code I used to generate the listing above (using the CSV data I attached).
* Reduce the data to the means of the 52 1=Y, 0=N responses
collapse (mean) response, by(barrier app)
rename response pYes // pYes = p(Yes) for the 52 respondents