I intend to improve a scale yet not by having lots of self created items and then delete them bit by bit but through adding items of already assessed scales. Does anyone have experience with that or suggestions on how to process?
I agree with Timo--it all comes back to theory. If you have a theoretical justification for adding items, then do so and run them through CFA. Being purely empirically-driven (i.e., just chucking random items to see what they "do") is less wise.
An additional point, though: You mention that your reason for adding items is to improve reliability. However, reliability and validity are distinct characteristics of a measure. It is possible for a scale to demonstrate low reliability yet still possess validity (i.e., accurately measure the latent construct after error variance is removed). If you have evidence that the scale is valid, I would suggest caution in adding items solely for the sake of increasing alpha (especially given that alpha = .65 isn't really that low). Increasing reliability at the (potential) cost of validity isn't a good tradeoff--especially when latent variable methods correct for the unreliability and provide you a true measurement of the latent construct purified of error variance.
For discussion of the possibility of low reliability and high validity, check out:
Little, T. D., Lindenberger, U., & Nesselroade, J. R. (1999). On selecting indicators for multivariate measurement and modeling with latent variables: When “good” indicators are bad and “bad” indicators are good. Psychological Methods, 4, 192-211. doi:10.1037//1082-989X.4.2.192
Hi Katharina, although I can't help you out directly, I recommend that you look over some of the responses of this post, I think they may help you out!
@all: thanks a lot for all your thoughts. Sorry for being not very precise in my problem description. Indeed, it is rather a question of method uncertainty than problem. I have an empirical tested/retested scale for cultural identity of 10 items and two underlying dimensions (identification with culture A and culture B). Yet, i would like to add more items to the scale in order to imrpove the realiability of both dimensions/factors (both alphas aorund .65). Hence my question is:
Can I simply create a survey, consisting of my scale and lets say..3 similar scales, throw them into AMOS (for example) to check if and which items of those 3 extra scales could fit to any of the two dimensions of my cultural identification measurement ? then if items of an extra scale would fit to dimension/ factor 1 of my cultural ID scale, can I add that item? are there legal issues - like I am stealing someones invented item and add it to my scale? is this a common or lets say accepted method of scale improvement/extension? Any reading suggestions on this topic?
Sorry for this big gap of knowledge. hope you can help me :)
@daniel clark: thanks for thagt hint - yet there the problem is the opposite: lots of items he can cut down. I have no times that I could exclude - I want to add items to hopefully improve validity and realibility.
I agree with Timo--it all comes back to theory. If you have a theoretical justification for adding items, then do so and run them through CFA. Being purely empirically-driven (i.e., just chucking random items to see what they "do") is less wise.
An additional point, though: You mention that your reason for adding items is to improve reliability. However, reliability and validity are distinct characteristics of a measure. It is possible for a scale to demonstrate low reliability yet still possess validity (i.e., accurately measure the latent construct after error variance is removed). If you have evidence that the scale is valid, I would suggest caution in adding items solely for the sake of increasing alpha (especially given that alpha = .65 isn't really that low). Increasing reliability at the (potential) cost of validity isn't a good tradeoff--especially when latent variable methods correct for the unreliability and provide you a true measurement of the latent construct purified of error variance.
For discussion of the possibility of low reliability and high validity, check out:
Little, T. D., Lindenberger, U., & Nesselroade, J. R. (1999). On selecting indicators for multivariate measurement and modeling with latent variables: When “good” indicators are bad and “bad” indicators are good. Psychological Methods, 4, 192-211. doi:10.1037//1082-989X.4.2.192
Hi Katharina, you're research sounds interesting. However Jim Turner is correct, as soon as you amend a validated scale it is no longer validated, which can be criticised severely if you plan on publishing in a peer reviewed journal. Perhaps you could consider using the existing scale and then creating a composite of your own for the missing components and if indeed there is a gap and this scale will fill that gap, you should consider performing a validation study of your own, which could be a separate publication and study in its own right. Good luck.
Using literature as source of the pool of items is quite common. However I cannot say the same for your particular method (integrating 3-4 scales). Churchill (1979) suggested a stepwise procedure for developing measures that you may find useful (there are also some updates on this procedure e.g., Gerbing and Anderson, 1988):
1. Specify domain of construct (you can use LR as well as other sources)
2. Generate sample of items (similarly LR could be a very important source)
3. Collect data
4. Purify measure
5. Collect data
6. Assess reliability
7. Assess validity
I think high level of similarity between items (which is not necessarily desirable) enhance internal consistency, therefore I guess putting items from 3 or 4 different scales together does not necessarily improve your alpha.
I agree with Andrew and Timo about the role of theory and using CFA, but I think while you probably have a large pool of items, it would be more appropriate if you use an EFA (using SPSS) first in order to purifying items and then running a CFA.
Churchill, G. A. (1979) A Paradigm for Developing Better Measures of Marketing Constructs. Journal of Marketing Research, 16(February), 64-73.
Gerbing, D. W. and Anderson, J. C. (1988) An Updated Paradigm for Scale Development Incorporating Unidimensionality and Its Assessment. Journal of Marketing Research, 25(2), 186-192.
Cultural frameworks may be difficult to test by traditional questionnaires in trans-cultural studies, for a simple reason: words are cultural vectors.The words of a language are by themselves an element of a culture. Just remember that you can not get a valid test, at any level, with the simple translation of a test that was validated in another language. Language is a cultural expression, is not independent of culture itself. For cultural studies I usually use non-verbal reactives (generally visual).
For one approach to statistically evaluating construct comparability across cultural groups, check out:
Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32(5), 53-76.
@Giuseppe Battistella thanks for the reminder, yet it happend to be that I am a cross cultural psychologist ;P so it is my goal to detect such cultural differences - the original scale which i intend to use for the extention study has been validated across four countries (CHN, UK, GE, US) already, and a test0retest analysis is in progress across five countries (CHN, IND, GE, US, UK) with a minimum of 100 participants in each cultural group.
Andrew Ledbetter · interesting, thanks for that, will have a look at it. in general I have to be aware of resonpse bias - as Chinese participants answer in totally different patterns to likert scales than americans. group mean standardization was suggested as a common solution for that. would you agree? or any othe roption in mind?
That's not a bad idea and probably necessarily for some sorts of analyses. Briefly, what the Little article suggests doing is running a multiple-group CFA that iteratively constrains elements of the model (item loadings, then item intercepts, then the covariance paths among latent constructs) in a series of nested tests. If item loadings and intercepts are statistically equivalent across groups, Little argues the constructs are comparable across groups.
The technique is applicable in any multiple-group context, not just cross-cultural groups; see, for example, how my colleague and I used the test to compare young adult children from divorced and non-divorced families:
Schrodt, P., & Ledbetter, A. M. (2007). Communication processes that mediate family communication patterns and mental well-being: A mean and covariance structures analysis of young adults from divorced and non-divorced families. Human Communication Research, 33, 330-356.
@Andrew Ledbetter: Great - thanks for your article. Indeed, I talket about multiple group CFA with my supervisor. Only problem is that then my sample size needs to be even bigger... or I parcel items into groups. Again, thanks a lot for the readings. will search them now in our elibrary :)