Hello,

I have a data set with N = 369 individuals measured at a single time point. The goal of the study is to create an assessment of psychological safety (PS). The assessment is a self-report measure asking participants to indicate how psychologically safe they feel using a unipolar 5-point Likert scale ranging from 1 (not at all) to 5 (extremely).

In addition to the assessment I am creating, I also measured a number of demographic variables (e.g., age, salary) and a few additional measures of team environment for validation (e.g., an existing measure of PS, level of team interdependence).

My primarily goal is to run exploratory factor analysis (EFA). This is the first time anyone has conceptualized PS as multidimensional, so one of the primary goals is to uncover the potential factor structure of PS. Also, to identify candidate items for deletion.

In order to prepare for the EFA analyses, I am cleaning the data by following recommendations in (the excellent) Tabachnik & Fidell (2013, 6th ed).

I am currently at the point where I am checking the data for multivariate outliers, starting with Mahalanobis distance. And I cannot find explicit guidelines regarding which variables I should be including as "IVs" in the analysis.

QUESTION: Which variables should I be including in my search for multivariate outliers? Do I include all variables, or only my target variables?

Specifically, do I include only the variables that represent the item pool for my forthcoming PS assessment? Or do I include all the PS items AND demographic variables, the existing PS assessment, interdependence measure, etc.??

I ran the Mahalanobis distance analyses 2 times using both approaches, and found substantial differences:

  • TIME 1 - With just the PS assessment variables --> I identified n = 28 multivariate outliers.
  • TIME 2 - With PS items + demographics, etc. --> I identified n = 10 multivariate outliers (all identified as outliers in the TIME 1 analysis).

Syntax I am using - the bolded variables are the ones I am questioning if I should include or not:

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA COLLIN TOL

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT Subjno

/METHOD=ENTER Age Salary Edu WorkStructure TeamSize Tenure_OnTeam JapaneseBizEnviron EdmondsonPS_TOT Interdep_TOT Valued_TOT PS_1 PS_48 PS_141 PS_163 PS_43 PS_53 PS_73 PS_133 PS_135 PS_19 PS_60_xl26 PS_93 PS_106_xl26 PS_143 PS_58 PS_86 PS_182 PS_56 PS_69 PS_103 PS_164 PS_22 PS_35 PS_91 PS_30 PS_59 PS_63 PS_90 PS_131 PS_140 (**Note, PS assessment var list is truncated b/c large number)

/RESIDUALS = OUTLIERS(MAHAL)

/SAVE MAHAL.

More Melissa Tarantola's questions See All
Similar questions and discussions