Which statistical test is to be used to analyze a question with only a yes or no answer from 52 respondents?

Since each of these questions are yes/no, I am assuming you want to know if you can say whether a majority of respondents say "yes" or a majority say "no." If that is the case and this is a random sample, you would be conducting a two-tailed test of hypothesis with the null hypothesis being Mean = .50 or the proportion of yeses is .50. The simplest test for this would be a sign test.Any basic statistics text has the simple process for running this test. The only assumption for this test is the random sampling so the results could be applied to the full population. The greatest disadvantage is probably the experiment-wise error rate since you will be conducting 300 such tests. I recommend that you run each test a a very small error-rate (such as .001) to maintain a reasonable experiment-wise error rate.

Mohialdeen Alotumi

Choosing a proper statistical test is dictated by the purpose of analysis, the nature of data, and the measurement level of variables. That said, if you intend to test the hypothesis of whether the observed frequencies for the two categories (i.e., yes vs. no) differ significantly from an expected value, you could go for the binomial sign test for a single sample, which is a nonparametric test for categorical or nominal data. For germane insights, you might refer to chapter 9 of Sheskin’s (2011) book, which is fully cited below.

Sheskin, D. J. (2011). Handbook of parametric and nonparametric statistical procedures (5th ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9780429186196

Good luck,

Bruce Weaver

Muhammad Waqas gave this example of a question:

For example, the question is. "Can the application of Mix Reality (Visualization) overcome the barrier (Difficult to convince the client)"?

Is each of the 300 questions about a different (unique) "barrier"? Or can there be multiple questions per barrier, asking if application of other methods or approaches (other than Mix Reality) can overcome that same barrier (difficult to convince client)?

Later, you wrote:

So, simply I use the method that has a greater percentage of yes or no.

That suggests you might be comparing different approaches to overcoming the same barrier(s). But if the different approaches have different costs, surely you would also factor in cost, not just which approach had the highest proportion of endorsements. Would you not?

It seems to me we still do not have enough information about what you are trying to do to offer good advice. Based on what you've said so far, I'm still not convinced that null hypothesis significance testing (NHST) is needed, for example.

Muhammad Waqas

Bruce Weaver, I will elaborate more. I have 10 applications/use of automation in construction technologies (BIM 4 applications, Mix realities 3 applications, and so on) and I have 30 barriers in total. There are 30 questions for each application. for example "Can the application of Mix Reality (Visualization) overcome the barrier (difficulty to convenience the client )". Another question is "Can the application of Mix Realty (Visualization) overcome the barrier(Lack of Suppliers)", and so on. For each question, there are 52 responses in the form of YES or NO answers. "MY QUESTION IS HOW TO ANALYZE THIS TYPE OF DATA AND HOW TO SELECT THE BARRIERS THAT CAN BE OVERCOME".

Bruce Weaver

That helps a bit, but I'm still guessing (somewhat) as to what your data file looks like. Does it look like the CSV file I have attached? Or could it be restructured to look like that?

If it does look like that, are you trying to generate output that shows for each barrier the application(s) that have the highest proportion of YES responses? Something like this?

. list barrier app pYes if flag, clean noobs

barrier app pYes

1 7 .5192308

2 1 .6730769

3 2 .5192308

4 8 .5192308

4 10 .5192308

5 3 .5192308

6 4 .4807692

7 9 .5192308

8 7 .5

9 3 .5192308

10 9 .5384616

11 1 .4615385

12 6 .5192308

13 3 .5576923

14 2 .5576923

15 4 .5

16 7 .5192308

17 4 .5192308

18 1 .4423077

18 3 .4423077

19 2 .5192308

20 7 .5384616

21 8 .5

21 9 .5

22 2 .5192308

23 1 .4807692

24 2 .5384616

25 3 .5192308

25 6 .5192308

26 8 .5384616

27 6 .5384616

28 4 .5

29 7 .4615385

30 1 .5384616

Thanks for clarifying.

By the way, if that is what you are trying to do, here is the Stata code I used to generate the listing above (using the CSV data I attached).

* Reduce the data to the means of the 52 1=Y, 0=N responses

collapse (mean) response, by(barrier app)

rename response pYes // pYes = p(Yes) for the 52 respondents

* Flag the max value of pYes for each barrier

bysort barrier: egen maxpYes = max(pYes)

generate flag = pYes == maxpYes

list barrier app pYes if flag, clean noobs

How combine yolo with Faster R-CNN?

What should be the sample container for the hydrothermal reaction in a microwave reactor at 180 °C for 10 min at the heating rate of 5 °C per minute?

Addition of EDTA during the synthesis of copper nanoparticles to prevent it from being oxidized?

Can I please ask why my samples from anaerobic bioreactor giving me different size PCR product even after multiple runs?

Swerling Characteristic functions?

Radar Detection Probabilities?

Why methanol and sulphuric acid used in the analysis of polyhydroxyalkanoates (PHA) by GC-MS?

Radar Detection Probabilities using beta distributed Scattering Cross section?

I want to buy Hydrothermal Synthesis Autoclave from any European company. Can anyone suggest any company inside Europe?

Optimal condition for depositing FTO target 95:5% using sputter technique?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

How to report results of Generalised Linear Mixed Models in a journal article?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Which statistical test should we use?

Can you recommend provider Intermediate laboratory test for hydrocarbon, polyethylene & polypropylene?

Is factor analysis with quantitative variables possible?

Is a reliability test necessary in my survey on translations?

How do I access .vcf files without an R statistical package?

Best statistical test for three groups and binary dependent variable ?