I am a Ph.D. candidate (Information Systems) and a beginner in statistics and SEM. I have several questions in regard to my SEM model and analysis for my dissertation study and would appreciate advice on that.
My SEM model:
------------------------------
G1-G5 (latent variables) => GOV (latent variable)
Each of G vars has one or more indicators (measured vars)
OS1-OS3 (latent variables) => SPON (latent variable)
Each of OS vars has one or more indicators (measured vars)
Main hypothesized paths:
GOV => FS (dependent variable (DV), latent)
SPON => FS
FS = FS1-FS5 (FS construct's components, latent variables)
Each of FS vars has one or more indicators (measured vars)
Three control variables (latent, but may simplify to measured)
Legend: GOV represents governance, SPON - organizational sponsorship, FS - FLOSS Success.
Context: free/libre open source software (FLOSS) development
------------------------------
Questions:
1. If leftmost part of my model (G => GOV, OS => SPON) represents MIMIC model (Schumacker & Lomax, 2010), how do I need to handle this?
2. Is there any special way to handle SEM models with multiple levels of latent variables, such as [measured -> latent => latent => latent (DV) = latent
Hello Aleksandr
just a quick response for now - but with follow up questions first.
1. How is (G => GOV, and ) a MIMIC model? I guess you have exogenous variables (the Gs), but what are the endogenous variables required to make the MIMIC model? Does the GOV construct directly impact on the five FS variables (FS1, FS2, FS3, FS4 an FS5)?
In regression language, it appears you have the following equations - is this correct?
GOV = a1.G1 +a2.G2 +a3.G3 +a4.G4 +a5.G5 + e1 (Eq. 1)
Where the a1 is the parameter estimate of the relationship that G1 has with GOV, and so on, and e1 is the amount of the GOV variable that is unexplained by G1, G2, G3, G4 and G5.
What now are the endogenous variables that GOV itself has an impact on to make this a MIMIC model?
Might it be that you have the following equation?:
FS = c1.GOV + e2 (Eq. 2)
Or do you have the following equations…?
FS1 = f1.GOV + e3 (Eq. 3)
FS2 = f2.GOV + e4 (Eq. 4)
FS3 = f3.GOV + e5 (Eq. 5)
FS4 = f4.GOV + e6 (Eq. 6)
FS5 = f5.GOV + e7 (Eq. 7)
2. By the look of things, you appear to have a variable, FS, that is formative, being a composite of FS1, FS2, FS3, FS4, and FS5.
If so, I assume you are considering modeling FS using the following regression equation:
FS = b1.FS1 +b2.FS2 +b3.FS3 +b4.FS4 +b5.FS5 + e8 (Eq. 8)
Or are you considering modeled FS as a higher order reflective variable, such that:
FS1 = g1.FS + e9 (Eq. 9)
FS2 = g2.FS + e10 (Eq. 10)
FS3 = g3.FS + e11 (Eq. 11)
FS4 = g4.FS + e12 (Eq. 12)
FS5 = g5.FS + e13 (Eq. 13)
Your answers will guide my responses to the initial questions you asked.
3. I would also ask that you carefully scrutinize the following papers, since they will answer many of your questions directly.
A) Why one should not use formative variables in endogenous positions (i.e., why it would lead to invalid conclusions, for instance, to predict a formed FS focal variable with GOV) – read:
Cadogan, John W & Nick Lee (2013) Improper Use of Endogenous Formative Variables
Journal of Business Research 66(2):233-241. DOI:10.1016/j.jbusres.2012.08.006
(https://www.researchgate.net/publication/236004021_Improper_Use_of_Endogenous_Formative_Variables/file/3deec515880fd07e2c.pdf?origin=publication_detail)
B) Why it would be wrong (invalid) to model FS (or any other variable) as a higher order reflective variable, read:
Lee, Nick and John W. Cadogan (2013) Problems with Formative and Higher-Order Reflective Variables
Journal of Business Research 66(2):242-247.
https://www.researchgate.net/publication/236004024_Problems_with_Formative_and_Higher-Order_Reflective_Variables
see also
Becker, Jan-Michael, Kristina Klein, Martin Wetzels, (2012), "Hierarchical Latent Variable Models in PLS-SEM: Guidelines for Using Reflective-Formative Type Models", Long Range Planning, Volume 45, Issues 5–6, October–December 2012, Pages 359-394.
C) Why it would be wrong (invalid) to use a MIMIC model to identify (or represent) a formative focal variable – read:
Lee, Nick, John W. Cadogan, Laura Chamberlain (2013), “The MIMIC Model and Formative Variables: Problems and Solutions”, AMS Review 3(1):3-17.
https://www.researchgate.net/publication/236004028_The_MIMIC_Model_and_Formative_Variables_Problems_and_Solutions.
Cadogan, John W, Nick Lee, Laura Chamberlain (2013) “Formative Variables are Unreal Variables: Why the Formative MIMIC Model is Invalid”, AMS Review 3(1):38-49.
(https://www.researchgate.net/publication/236004029_Formative_Variables_are_Unreal_Variables_Why_the_Formative_MIMIC_Model_is_Invalid/file/72e7e515885291f858.pdf?origin=publication_detail)
Lee, Nick, John W. Cadogan, and Laura Chamberlain (2014), “Material and Efficient Cause Interpretations of the Formative Model: Resolving Misunderstandings and Clarifying Conceptual Language”, AMS Review (forthcoming)
(https://www.researchgate.net/publication/259360239_MATERIAL_AND_EFFICIENT_CAUSE_INTERPRETATIONS_OF_THE_FORMATIVE_MODEL_RESOLVING_MISUNDERSTANDINGS_AND_CLARIFYING_CONCEPTUAL_LANGUAGE/file/5046352b562774ef1f.pdf?origin=publication_detail)
D) For a paper that explains why PLS is not a good choice for assessing the quality of multi-item measures, or for testing structural theory about the relationships between variables – read:
Rönkkö, Mikko and Joerg Evermann (2013), A Critical Examination of Common Beliefs About Partial Least Squares Path Modeling. Organizational Research Methods 16(3): 425-448.
In light of the issue highlighted in these papers, my broad advice would be:
(1) Don’t use formative variables in endogenous positions, and question heavily why you are using formative variables in exogenous positions. Formative variables are higher-order, multidimensional things, by definition, and higher order constructs often hide critical processes or outcomes. Besides, there is probably a better more interesting theory to be uncovered by focusing on the first order constructs anyway.
(2) Don’t use reflective higher order variables either… for the same reasons as above (they are invalid and hide real relationships).
(3) Don’t use PLS to develop or assess the quality of multi-item measures. Use a covariance based SEM package, which can model correlations between measurement errors, and can provide better information on construct dimensionality.
(4) Use a conventional (covariance based) SEM package to test your structural theory too - it has a Chi-square test, and allows you to formally test and reject theory (plus see Rönkkö and Evermann (2013) cited above).
(5) Your control variables – are possibly just single indicators. However, they can still be modelled as latent variables in a covariance based SEM package (just make some assumptions about how reliable the single item measures are –they are unlikely to be perfect measures).
I am unsure how to answer your Qs 4 and 5 in light of these responses so far.
Have fun reading.
John
Article Improper Use of Endogenous Formative Variables
Article Problems with Formative and Higher-Order Reflective Variables
Article The MIMIC Model and Formative Variables: Problems and Solutions
Article Formative Variables are Unreal Variables: Why the Formative ...
Article Material and efficient cause interpretations of the formativ...
Hello, John!
Great read, indeed! Thanks a million for a fast and detailed feedback! Terrific!
I will answer your clarifying questions, read the sources you recommended and will get back to you as soon as possible.
Kind regards,
Alex
Hello, John!
I'm sorry about the delay with my reply. I didn't have a chance to read all papers you recommended, but decided to answer your clarifying questions to the best of my knowledge and post some comments.
===
"1. How is (G => GOV, and ) a MIMIC model? I guess you have exogenous variables (the Gs), but what are the endogenous variables required to make the MIMIC model? Does the GOV construct directly impact on the five FS variables (FS1, FS2, FS3, FS4 an FS5)?" (Your Q.)
The following equations are correct:
GOV = a1.G1 +a2.G2 +a3.G3 +a4.G4 +a5.G5 + e1 (Eq. 1)
FS = c1.GOV + e2 (Eq. 2) [Eq. 3-7 do not make sense in my view of subject matter]
"What now are the endogenous variables that GOV itself has an impact on to make this a MIMIC model?" (Your Q.)
FS is such endogenous variable (see Eq. 2). Considering both factors:
FS = c1.GOV + c2.SPON + e3 (Eq. 2')
===
"2. By the look of things, you appear to have a variable, FS, that is formative, being a composite of FS1, FS2, FS3, FS4, and FS5." (Your Q.)
I think this is a correct assumption in a sense that it matches the subject matter. Consequently, the following equation is correct (and Eq. 9-13 do not apply here):
FS = b1.FS1 +b2.FS2 +b3.FS3 +b4.FS4 +b5.FS5 + e8 (Eq. 8)
===
Combining insights from your questions, it appears that FS is indeed an endogenous formative variable, which you don't recommend to use (your ref. A). Moreover, you recommend (your ref. B) against modeling FS as a reflective variable, which brings me to a state of deep confusion...
I'd like to make one additional comment in regard to your recommendation against using PLS (your ref. D). I read that one of the benefits of PLS approach is that is better than covariance-based SEM for exploratory studies, where there is no firm theoretical foundation. I believe this is the case for my dissertation study, as comprehensive theory of open source software development does not currently exist (as far as I know, at least). I also read about a (relatively) new approach in SEM for exploratory studies, called, obviously, exploratory SEM (ESEM). Assuming this is a valid SEM method and it can be applied to my study, I am aware of only one implementation of ESEM in a software package, and that is Mplus (https://www.statmodel.com/ESEM.shtml). Since I decided to use R packages for my SEM analysis (OpenMx and/or lavaan), I don't feel comfortable with considering ESEM unless it has been implemented in a package for R. I would love to know your opinion on this in addition to the above issues.
Another question is how to handle moderation and mediation (MM) in SEM. Among several other papers, I read the widely-cited work by Baron and Kenny (1986) [http://www.public.asu.edu/~davidpm/classes/psy536/Baron.pdf], but I'm not sure how to apply it to SEM. By reading notes on the topic by Jeromy Anglim [http://jeromyanglim.blogspot.com/2008/11/mediation-and-moderation-reference.html], it appears that it's pretty easy to test for MM in R, but I'm wondering if it's automatically gets done as a part of SEM analysis. I found some information on MM in context of SEM in the work of Andrew Hayes (http://www.afhayes.com/introduction-to-mediation-moderation-and-conditional-process-analysis.html), but it seems that his implementations are again not in favor of R.
Thank you for your patience. I look forward to hearing from you.
Kind regards,
Alex
Hi Alex - let me digest your post (after a "full on" day doing as a visitor and presenting my work on this very topic at the University of East Anglia, here in the UK... :) I think I'll be able to offer some concrete advice on your concerns. Speak later!
John
Hi, John!
Great! I look forward to hearing from you. In the meantime, I'll try to read more from your list.
Best regards,
Alex
Can anyone provide an example of reference that justify the use of SEM with convenience sample? (it could be a statistic paper and/or any type of published paper doing this) Many thanks for this
Hello, Giacomo!
You can find some great answers to your question in the following thread here on ResearchGate: https://www.researchgate.net/post/Is_it_appropriate_to_use_structural_equation_modeling_with_convenience_samples.
In regard to references, I found several examples, but I'm not sure how solid these journals are. You can search Internet for more, if needed:
1) http://journals.humankinetics.com/jtpe-back-issues/jtpe-volume-31-issue-4-october/studentsrsquo-beliefs-and-intentions-to-play-with-peers-with-disabilities-in-physical-education-relationships-with-achievement-and-social-goals;
2) http://www.ijsmart.eu/onlinepic/vol5_1%20Choonghoon.pdf
3) http://informahealthcare.com/doi/abs/10.3109/17518423.2013.835357
Best regards,
Aleksandr
Hi Alex
attached are some comments on the and responses to Qs...
Best
John
John,
Thank you for your detailed answer and constructive comments! I would greatly appreciate if you could clarify some issues for me below.
1. I analyzed your answers and comments and I am happy to tell you that, in fact, your assumptions about my model do match my theoretical assumptions (hypotheses) about FLOSS Success (FS) phenomena. As I wrote above, FS is indeed a formative construct. So, I agree with your Fig. 2 and Fig. 3 from your previous reply. However, FS (and GOV) is higher-order composite (please see my note #3 below).
2. There are several things that I'd like to note in regard to the model in Fig. 3. Firstly, I don't understand why you removed FS construct from the model, as in Fig. 2. As far as I understand, you "hide" the higher-order latent (formative) construct FS to imply that hypothesized effects are from success factors to FS components. I have a problem with that, because I don't think I can hypothesize such relationships a priori, due to lack of theoretical basis for that. In fact, I hypothesize effects from success factors' components (G1-G5 and OS1-OS5) to FS as a whole, as my proposed study is mainly exploratory. I was expecting that by proposing such "rough" hypotheses first and then performing SEM analysis I will be able to test them and also make some conclusions on potential (more "detailed") relationships/effects between success factors and FS components.
Secondly, my model is drawn in a way that two main factors are located on the left side (one above the other), while the dependent variable (FS) is on the right side. By this placement I want to imply potential causality in the model. I'm not sure if I'll be able to use SEM to test that implied causality, but, even, if not, I think it would be good to reflect the temporal feature of my model (and, thus, reality). I assume that you drawing setup is simply due to the tool's (horizontal) layout space constraints.
3. In addition, similarly to your depiction of SPON construct being a composite of several latent constructs, both GOV and FS are, in turn, composite of several latent constructs as well. I believe that you call such constructs "higher-order formative". So, G1-G5 and FS1-FS5 are, in fact, not indicators, but latent constructs.
4. I'm not sure I understand your expression "rather vague conceptually" in reference to GOV and SPON constructs. What makes them vague and what I need to do to fix it?
5. I understand your recommendations on EFA and careful selection of indicators. Do you recommend to run CFA in addition to EFA?
6. However, I don't understand how my model is not meaningful and parsimonious. I want to study two presumably important factors (GOV & SPON) potentially affecting FLOSS success (FS). Each of these three constructs consists of several latent variables, thus, making GOV, SPON and FS higher-order formative constructs. I read many times in literature that for SEM analysis to be valid, it should be theory-driven. That's exactly what I'm trying to do. I build my theory by hypothesizing about relationships between higher-order constructs to produce conceptual model. Then I decompose these constructs to lower-order constructs to produce structural model. Then, finally, I decompose these lower-order constructs into indicators to produce measurement model. This is my understanding of the process.
7. I didn't understand your phrase "model the items as reflective single item measures of latent variables". Why would I do that, if the best (closest to reality) representation of those items is formative?
I'm sorry for maybe sometimes excessive details, but I just want to have a clear understanding of the subject. I look forward to hearing from you!
Kind regards,
Alex
Hi Alex
Not a point by point response - I think it would much easier to talk it through with a pen and some paper :) But that isn't possible, so...
For me to really understand what you're doing/thinking, it might help if you draw the broad "measurement model" you are hoping to use.
That'll mean starting with items and first order factors and drawing arrows from the items to the latent variables (if formative) or from the latents to the items (if reflective).
Then show how the first-order latents form higher-order things. That'll mean arrows from the first order factors to the second order factors. Then it might be that you form a third order thing with your second order things.
Eventually, you'll stop constructing higher-order variables from lower order things.
From there you are hoping to model covariances between the higher-order variables.
Well - if you do this and show mw a picture of what your doing, then I'll be able to speak your 'language', so to speak.
But more fundamentally, the core issue is that formed variables are not conceptually meaningful in that they are just groupings of other things, and so have no real existence. They are not what Borsboom and colleagues would call "real" things. Because they are not conceptually real entities, they do not have real relationships with other variables. This is particularly problematic when you have two formed things. What you get then is a test of a model in which an "unreal" thing is predicting another "unreal" thing!
We provide some guidance on how to work out whether your variables are real or not in Cadogan et al. (2013):
Formative Variables are Unreal Variables: Why the Formative MIMIC Model is Invalid
https://www.researchgate.net/publication/236004029_Formative_Variables_are_Unreal_Variables_Why_the_Formative_MIMIC_Model_is_Invalid
(see the section: "Conclusions: where do we go from here?")
So perhaps you can reflect on the actual entities underpinning your model - and come up with some conclusion as to the real entities that underpin the model (the measures must have some 'real' things that are the foundational conceptual platform. It's only when you try to squash conceptually different 'real' things together that you 'form' something, and introduce conceptual ambiguity into the conceptual model. My suggestions in the last message were directed finding conceptually real things and creating models from them (so yes, CFA after EFA).
I don't think a default formative approach will help you "explore the data"...
John
Article Formative Variables are Unreal Variables: Why the Formative ...
Ok - being a bit more specific.... (but understanding the answers does require that you digest the papers I referenced earlier).
POINT 2
Your point 2 raises some interesting issues.
You say "Firstly, I don't understand why you removed FS construct from the model, as in Fig. 2.”
Answer: Fig 2 is the only way that G and SPON could influence a thing called FS.
Your definition of FS is that FS is a composite of FS1 to FS5. If G causes FS, it cannot cause FS directly, because FS does not have its own existence independent of FS1, FS2, FS3, FS4, and FS5. The change in the column of data called FS occurs because FS1 or FS2 or FS£ or FS4 or FS5 has changed, and if G causes FS to vary, then, it must be because it causes FS1 or FS2 or FS£ or FS4 or FS5 to vary.
So in Fig 2, I show the real way that G and SPON could create change in a column of data called FS: through the “items”.
You then say: “As far as I understand, you "hide" the higher-order latent (formative) construct FS to imply that hypothesized effects are from success factors to FS components.”
Well, yes and no – I don’t hide FS. FS is not a “real variable” that needs hiding. It’s a fabricated thing, made up of FS1, FS2, FS3, FS4, and FS5. So as I say above, the ONLY way that G and SPON can change FS is by changing FS1, FS2, FS3, FS4, or FS5 – and that’s what Fig 2 “says”.
You continue: “I have a problem with that, because I don't think I can hypothesize such relationships a priori, due to lack of theoretical basis for that. In fact, I hypothesize effects from success factors' components (G1-G5 and OS1-OS5) to FS as a whole, as my proposed study is mainly exploratory. I was expecting that by proposing such "rough" hypotheses first and then performing SEM analysis I will be able to test them and also make some conclusions on potential (more "detailed") relationships/effects between success factors and FS components."
Hmmm. I’m afraid that, say, G1 cannot change FS – it can change FS1, FS2, FS3, FS4, or FS5 – but FS is a “made thing” that can vary only if its defining elements vary. So when you say you cannot hypothesize such relationships a priori, due to a lack of theory, your reasoning must also extend to FS – it must also be impossible to hypothesize a relationship between G1 and FS. In fact, it should easier to hypothesise to hypothesise a relationship at the “lower levels”: assuming G1 is a real thing (not a formed thing), and that FS1 is also real (not formed), then it would be entirely possible to “think about” and eventually come to some logical conclusion regarding the following mutually exclusive options.
Option 1: G1 and FS1 have exactly a zero relationship.
Option 2: G1 and FS1 have a non zero relationship.
Also, and critically, If you squash up the FS1, FS2, FS3, FS4 and FS5 variables into a single column of data, then it’s perfectly possible that G1 might positively predict FS1 and FS2 and negatively predict FS3 and FS4., but that on average, G1 might not predict FS as a single composite (because all the positives and negatives cancel out). The conclusion that G1 is not important because it does not “cause” FS would be misguided – G1 has a relationship with lots of FS’s components (in this fictional example).
In sum, then, Alex, I would suggest that it will be easier and more meaningful to “come up” with realistic exploratory hypotheses regarding relationships between real variables than between relationships between formed ones. The drawback? The model gets more complicated because the number of relationships is greater (relative to a model with just a few formed things in it).
Regarding your issue about testing for causality: SEM cannot test for causality… it can provide evidence on relationships between variables, and compare theory with data. That’s all. Even if the data match the theory, this still isn’t proof that the theory is valid. All we do with SEM then is gather evidence that gives us more or less confidence in our theory of causality.
POINT 3
You say that “G1-G5 and FS1-FS5 are, in fact, not indicators, but latent constructs.” Out of interest, are the G1-G5 and FS1-FS5 variables formed things? For instance, is G1 a unidimensional single entity that is not formed from anything else, but is a single variable? And so on. Or is G1 actually itself a composite of other issues?
POINT 4
You ask “I'm not sure I understand your expression "rather vague conceptually" in reference to GOV and SPON constructs. What makes them vague and what I need to do to fix it?” I hope that you can see that anything that is formed, and that doesn’t have its own identity, is vague at a conceptual level. It’s not unique to your GOV and SPON variables – it’s inherent in formed variables. The solution or fix is to look to identify possible real dimensions underpinning your data (e.g. using EFA on the measurement items).
POINT 5
“Do you recommend to run CFA in addition to EFA?” Yes!
POINT 6
You comment that: “I build my theory by hypothesizing about relationships between higher-order constructs to produce conceptual model. Then I decompose these constructs to lower-order constructs to produce structural model. Then, finally, I decompose these lower-order constructs into indicators to produce measurement model. This is my understanding of the process.”
Building hypotheses about relationships between higher-order variables is often quite difficult – the higher-order things are contrived / formed and do not have conceptual unity internal to their meaning. Relationships or lack of relationships with other variables are essentially meaningless. Sorry.
POINT 7
You didn't understand the phrase "model the items as reflective single item measures of latent variables". You ask “Why would I do that, if the best (closest to reality) representation of those items is formative?”
First – to reiterate - formative things not representative of a real entities. Formed things are not real things- they are artificially created things. If you have a real entity that you want to measure, and you think your formative items measure that entity, they do not. Formative things are not measures (see Lee et al. 2014). The formative items might be considered as causes of
Lee et al 2014 https://www.researchgate.net/publication/259360239_MATERIAL_AND_EFFICIENT_CAUSE_INTERPRETATIONS_OF_THE_FORMATIVE_MODEL_RESOLVING_MISUNDERSTANDINGS_AND_CLARIFYING_CONCEPTUAL_LANGUAGE
Second, you may have a question in your data that is a stand-alone question. It is a single item. The item may be a valid measure of an important latent variable. If you do not have other reflective e items to measure that latent variable then you have no choice but to just use the single item as a measure of it. Such measures will not identify in an SEM model, so you need to make assumptions about the quality of the measures (it reliability) in order to use them in structural models. That is what I meant by “as reflective single item measures”.
So - hope there aren't too many typos and that what I say makes sense:)
Article Material and efficient cause interpretations of the formativ...
Hi, John!
I just discovered that you posted more details. This is just truly awesome! Please don't worry about typos, for this kind of conversation I only care about the essence. What you say totally makes sense, however, I need some time to digest all that wisdom... :-)
Just to save some electrons from an extra round-trip across the ocean, I'm attaching my SEM model. It is VERY preliminary and might contain minor or major mistakes, but I'd like you to see what I have in mind / work on. For the sake of simplicity and space constraints, I show only single indicator per latent variable, while in reality for most there are multiple indicators per variable.
It's also possible that I would need to introduce one more "level" of latent variables on each "side" of the model (IVs/DV) to reflect reality. You can understand the reasons by looking at my preliminary operationalization of the model's variables, contained in the end of the document I'm sending to you in personal message [as RG doesn't allow for multi-item attachments in Q&A] (contains that plus some other excerpts from my proposal that I thought you may find interesting). (Please ignore detailed SEM model in the text.)
Kind regards,
Alex
John,
Sorry, forgot to clarify. Above I say: "It's also possible that I would need to introduce one more "level" of latent variables on each "side" of the model (IVs/DV) to reflect reality". What I mean is that it would reflect "Attribute/Measure" column (vs. "Metric/Indicator" column reflecting real measured items) in operationalization tables in Appendix A of the document I just sent you.
Thank you,
Alex
Hi, John!
Sorry about the "silence" - I was busy with my data collection activities. Thank you for your comments in my document. However, it's not clear to me, whether you had a chance to see my more realistic model I've attached (PDF) to my message here 10 days ago. Could you comment on it?
In regard to your feedback in the document, I do plan to perform EFA and CFA, though I don't have prior experience in them. However, I am hesitant to reduce the hierarchy in latent variables (that I understand you suggest), because this hierarchy makes sense both from theoretical and practical perspectives of subject matter. Additionally, my dissertation committee won't agree with this. My SEM model in very initial revisions of idea paper was like that (many first-degree latent variables), but they insisted on simplifying the model and keeping the topic narrow enough. And, while, in the beginning, I liked my "big" model as I wanted to create a "comprehensive" model of FLOSS, then I realized that it's just would be impossible to perform any reasonable and valid research with so many variables and an insane number of hypotheses to test due to multitude of paths between the variables.
I haven't had a chance to read all papers you recommended, but plan to do it as soon as possible. In the meantime, I'd appreciate if you could reply to this message and share your thoughts on my points and also on my PDF SEM model. Thank you!
Kind regards,
Alex
Hello Alex, you're correct. I did not see the PDF file from 10 days ago. I'll take a look at soon as I can.
Best
John
Alex,
my immediate Comment is that the model is impossible. At least from a logical perspective, it is impossible for our “World” to work the way your picture displays things. This is just a fact. FS, the focal construct is conceptually the same thing as a weighted sum of FS1, FS2, F53, FS4, and FS5 So, any change in FS must occur because FS1, FS2, FS3, FS4 or FS5 have changed. From a logical perspective, what your model says is impossible because there are paths from GOV or SPON to FS, but no paths to FS1, FS2, FS3, FS4 or FS5. That means you are saying that it is possible for FS to change in value (as a result of variance in GOV or SPON), but for the things that “are” FS to stay exactly the same. The only system that I know of that would allow that happen would be magic.
Well, that’s how logic works. If your committee are very set on forcing you to model illogical models, then you must either explain to them why that’s wrong, or look at adding Some supplementary (but logical model) to the illogical model they are insisting on.
I’m interested, though, in your own statement that you are: “hesitant to reduce the hierarchy in latent variables (that I understand you suggest), because this hierarchy makes sense both from theoretical and practical perspectives of subject matter”.
I don’t really understand what you mean by this. FS is a thing that is entirely artificial. It is a constructed thing. It is formed. If it were a real entity, it would be unidimensional. Your idea that a hierarchy makes Sense supports the possibility that you visualize FS as a real thing, a thing that Could be measured directly, but that you simply don’t have a direct measure of it. Possibly then, the FS1, FS2, FS3, FS4 or FS5 “things” might be close causes of the FS thing you want to measure. WelI maybe it’s Something like that.
I think it’s critical that you get your head around the reality of what a formed variable is. It is not a single thing that has some special meaning that exists independently of the “indicators”. It is just the stuff that is used to make it. In your model, FS is just FS1, FS2, FS3, FS4 or FS5. It’s nothing more than that. And if you think it is more than that, then I strongly suggest that you formally conceptualize the Stuff that it is that is not FS1, FS2, FS3, FS4 or FS5, and develop a measure of that stuff (a measure that is reflective and not formative). Otherswise, you really have no logical option but to model GOV & SPON as antecedents to the individual FS1, FS2, FS3, FS4 or FS5 variables.
The fact that all your other variables are formative is a major complication, but the Core issue you must deal with first is the issue of "what is FS?"
For Sure, you probably need to simplify the model to make it manageable + understandable.
All the best
John
John,
I greatly appreciate your fast feedback. Now I will have to take some time to digest it and clarify all things for myself. I will be in touch as soon as possible (considering that I still need to work on my data collection phase and corresponding R code).
Best wishes,
Alex
Hello, John!
I'm back. Sorry about the delay. Now I will try to combine discussion with you and working on clarifying conceptual issues of my research with continuing my R implementation of the data analysis framework that I recently has been busy working on. I'd like to comment on some aspects in your most recent reply.
My understanding is that main reason for using latent variables (conceptual constructs) in models (and models themselves) is the desire to simplify the real world. Regardless of whether latent variable is formative or reflective, it's an artificial "thing", an imaginary object. With that in mind, I don't understand how people create and use zillions of models, where they hypothesize paths between latent variables, in addition to, and, sometimes, instead of, some paths between latent variables and indicators. According to my limited understanding of the topic, the former represents (explains) correlation between indicators (items), while the latter represents (explains) correlation between latent variables.
Returning to my SEM model, I realize that in reality there exist some effects between GOV/SPON and each FS dimension (FS1-FS5). However, I believe that these effects explain FS variance only partially for at least two reasons: 1) there might be some other (unknown or out of scope of this study) FS dimensions that GOV/SPON have effect on; 2) there might be some interactions between FS dimensions (covariances). This is where IMHO second-order (or any higher-order) variables come to play, potentially representing (capturing, explaining) interactions between FS dimensions, between GOV dimensions and between SPON dimensions, correspondingly.
I think that the following excerpt by David Kenny (2011) supports my logic: "The major uses of a second-order are as follows: First, in one has a construct but finds that it is multi-dimensional by creating a second-order factor one can preserve the construct. Second, if a set of latent variables all cause the same construct, their colinearity may difficult to separate their effects, but by having the causality work through a single second-order factor, the colinearity is reduced. Third, by having just one latent variable instead of many, a second-order model is more parsimonious.". I had a better reference, but can't find it at the moment.
Finally, I've had a chance to access and read papers that you've recommended along with several other papers on the topic. All papers are interesting, but probably will require more than one reading attempt in order to better understand the essense of arguments. As a side note, your paper "Problems with formative and higher-order reflective variables" (Lee & Cadogan, 2014) does not seem to be applicable to my SEM model, where all latent variables are formative.
I understand that there exist an ongoing debate in SEM research community. Initially I thought that SEM, being based on statistics, which, in turn, is based on mathematics, has a rather defined set of rules and guidelines. Now I understand that this expectation is far from truth. While any constructive debate is useful for scientific progress, the existence of multiple opinions and schools of thought presents beginner SEM researchers, like me, with a problem of confusion. This is, in large part, from the lack of statistical and SEM experience, but I hope that things will start falling into places as I will read more, think more and try to perform more SEM research. In the meantime, I will try to do my best to understand and perform SEM, using logic and simple conceptual analysis.
UPDATE: I forgot to mention that I found another interesting paper (Coltman et al., 2008), which presents a framework that claims to help researchers "to design and validate both formative and reflective measurement models".
References:
Coltman, T., Devinney, T. M., Midgley, D. F., & Veniak, S. (2008). Formative versus reflective measurement models: Two applications of formative measurement. Journal of Business Research, 61(12), 1250-1262.
Kenny, D. (2011, September 7). Miscellaneous Variables. Retrieved from http://davidakenny.net/cm/mvar.htm
Lee, N, & Cadogan, J. (2014). Problems with formative and higher-order reflective variables. Journal of Business Research, 66, 242-247.