How to run multiple linear regression with robust standard errors clustered on participant and item in R?

13 May 2022 3 731 Report

Hi! I am trying to run the following multiple linear regression in R:

r ~ condition (1 = Control, 2 = Active Control, 3 = Treatment, 4 = Importance Treatment) + type (0 = false, 1 = true) + age (13, 14, 15, adult) + domain (1 = eco, 2 = health, 3 = society, 4 = culture)

r is the intention to share a certain headline for a certain participant (initially given on a 6-points scale but the score being transformed into a variable comprised between 0 and 1) , where participants are randomly assigned to 4 conditions, and we want to test the respective effect of 2 treatment conditions vs 2 controls on the intention to share false headlines (we predict it will reduce the sharing of fake news, without impacting the sharing of real news), knowing that in all the conditions the main task consists in assessing the intention to share 24 headlines successively presented to the participant, half false half true (so we also want to know the effect of the "type" of the headline, being true or false; and eventually the effect of its category - within the 2 sets of headlines, we always have 1/4 of headlines with an economic subject, 1/4 on health, 1/4 on "society", and 1/4 on culture, although it is less important); and finally we are testing different age groups (13, 14, 15 years old, as well as an "adult" group with participants aged from 25 to 36 years old pooled together), to know if the effect predicted varies with age etc.

The main hypothesis of our study are :

-that treatments will improve discernment, defined as the difference between the intentions (r > 0.5) to share true news and the intentions to share fake news (ie it will really improve the quality of sharing, and not merely causing general skepticism),

-that this effect will be higher for headlines perceived as the most inaccurate (there will be an analysis on headlines as well, based on pretest), as we suppose the effect of the treatment works by refocusing the attention of the participant on the accuracy criterion, hence being greater for headlines that were generally (we will calculate a mean perceived accuracy for each headline across participants) perceived as the least accurate ones, and that will be consequently the ones for which sharing intentions will drop the most thanks to the treatment

-We have no particular prediction concerning age, which is precisely the novelty of the study (the literature review concerning adolescents' ability to evaluate fake news and their sharing online leaving the door open to quite different scenari)

-and no prediction for domain as well (we expect it won't play a role as the headlines have been chosen to be quite similar in tone, whatever the category)

Also, I need to cluster by participant (since there are repeated measures for each participants) and headline (multiple ratings for each headline).

I thought of using lm_robust but I don't know if we can put 2 clusters directly?

I also wonder what is the simplest way to check for potential effect of other secondary measures (like scores on a Cognitive Reflexion Task, gender, CSP appartenance etc): do I have to do regression testing all simple effects, interaction etc, or can I just add them to the global formula?

Thanks in advance!

Rebaz Yahya

Dear Following this link

https://www.r-bloggers.com/2021/05/clustered-standard-errors-with-r/

David Eugene Booth

No offense but r before or after the transformation is not suitable for ols regression as a DV. Take a look at kutner et al applied linear statistical models. Available in the z-library. Best wishes David Booth BTW Rosner Fundamentals of Biostatistics discussed different types of DVs and the regression methods to deal with them. Also in z-library. Best wishes David Booth

Camille Perault

Thank you very much for all your answers! I have unfortunately no choice but to use R... so I am going to go for the lme4 method.

Thanks again to you all!

ANOVA/Mixed models in R : how to find the correct formula, interpret the results and what post-hoc comparisons?

What is the role of Digital Touch Points in the 5 stars Hospitality Customer’s Journey Industry ?

What is the reasoning behind the use of TRFs?

One step in achieving a good quality control in the lab is risk assessment. What are the internal quality control measures done to limit these risks?

Are there any risk in involving and applying the use of artificial intelligence in the clinical laboratory especially when obtaining the result?

Can I dissolve collagen IV powder in PBS for flask coating?

How to do a TUNEL / cell death assay on whole-mount E11.5 mouse embryo?

Can i conduct a comportmental study with before/after design without control group ?

Anyone knows the issue with Micriphone circular array beamforming I described below?

How many PCA dimensions should I keep ?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?