Does highly dichotomized continuous data violate the assumption of normality for most parametric tests?

03 March 2019 3 3K Report

For those of you with expertise in measurement and replication, what would you recommend in the following scenario:

We are preparing to replicate a between-subjects experiment where participants (N = 135) were randomly assigned to read one of 3 versions of a story (1 experimental condition and 2 control conditions) and then asked whether the protagonist in the story "knows" or "only believes" the stated claim (binary response; 3x2 chi square).

The original authors found no significant difference between the experimental condition and the "knowledge" control condition (Fisher’s p = .164) and a large significant difference between the experimental condition and the "ignorance" control condition (Fisher’s p = .001, Cramer's V = .509); but they claim that a more recent study suggests that if people are asked the same question using a more scaled response type (e.g., visual analogue scale, Likert-type), there may actually be a small real difference (d~= .1) between the experimental condition and the knowledge control that went undetected in their first study - suggesting that people might respond with more nuance given the opportunity.

Therefore, we conducted a pretest (N = 165) using a visual analogue scale (VAS; where 0 represents "only believes" and 100 represents "knows") in lieu of the original binary response format. Based on the pretest data though, it appears that participants still responded quite dichotomously (most responses were grouped near 0 or 100 in each condition, but less so in the experimental condition; see attached figure).

If this were your replication study, would you stick to a binary response type (knows/only believes) or try using a potentially more sensitive measure (VAS, Likert-type) to obtain more fine-grained data? With continuous data, we could also dichotomize the responses and calculate the non-parametric statistics in order to compare them to the parametric results, but I'm worried this may violate some statistical assumptions. Namely, wouldn't highly dichotomized data violate the assumption of normality for most parametric tests?

Daniel Wright

What was the distribution like for the "more recent" study? What methods did they use? What was the specific question and does it make sense with a scale (is there at least one, does not make sense with a scale, nor does a confidence scale assuming people either know it or not).

Short answer your question: If the distribution is clearly not normal, is it normal. No.

Matthew Kerry

Use ordinal scale, conduct non-param test and you should have significance (Mann-Whitney U).

Hope this helps

Matt

Thom S Baguley

I'm not sure I understand the Q, but I think the simple answer is no. Parametric tests don't require normally distributed predictors or outcomes, generally the most common ones are general linear models (or closely related models) and they assume normally distributed errors.

I also think VAS scales can induce artefacts - notably anchoring effects because of slider placement.

The statistical power issue is slightly different - if the response is overwhelmingly dichotomous then there will be minimal power loss by using a binary response and so logistic regression might be a sensible option.

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

Is it possible to plot the atom-projected band structure using GPAW?

Why does everyone use vs code?

Unusual intensity drop in some sections of chromatograms in DDA?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

How does one derive the standard deviation of a scale?

Why results of ROS flurescence are negative as there was no bacteria within?

Why specific capacitance is not proportional with the specific surface area?

Can you visualize platelets using EVOS ?

Does anyone know a source for theta-replicating Staphylococcus plasmids that are compatible?