I have pilot data for a scale I am developing and would like to know what analyses to write up. Should I do a Cronbach's alpha followed by a split-half reliability? Should I do EFA first or CFA first, or only one of them? Thanks.
How many participants does your pilot include? I agree with the above suggestions, I would do descriptive analyses first, which include checking for discriminatory power of the single items. FAs are not recommended for small sample-sizes.
Thanks Alvin, Gregor and Perikles. My sample is 85, which is just above the 80 minimum (I have 16 items, so 16 x 5 = 80). My theory is that the scale will form two factors, but EFA (PCA) is giving me two factors that don't match which items "should" be in them according to the theory.
I think you should do some item selection first, as Alvin suggests.
First, remove participants with too many missing responses or that, for other reasons, make you doubt of the quality of their responses (e.g., wrong answer to a bogus item, same response for every question, etc.).
Then, see if there are items with ceiling/floor effects, shorter range than expected, low variability, high skew, or unusually high number of missing responses. Also check if there are couples of items too correlated (> .7, unless the average inter-correlation is very high). Unless you did this already, double-check the item wording: are there items which actually pose two questions instead than one? Are there items which make unfounded assumptions about the participant? Are there items which are not clearly worded? Are there item which begin or even just contain the word "not"? All these items are prime candidates for deletion.
Only after this, do an EFA. Check with parallel analysis the optimal number of factors. Experiment with different rotations, and see if there is a solution which is theoretically meaningful (in general, I suggest always oblique rotations - Geomin, or Promax). Eliminate items with no loadings >.3, and after removing each item permorm the parallel analysis and the EFA again. Then, remove items which have high loadings on more than one factor (unless they are very different from one another). With luck, at the end of this you should obtain a simple factorial structure.
Then, gather some more data and perform CFA on the new sample only. If the EFA was unsuccessful, instead, try with a bigger item pool, taking inspiration from the few items which seemed to work and learning from the items which you had to exclude.
if you have double loadings or cross loadings, then your items should be rephrased or eliminated and preliminary stats redone fort each modification. hope it helps!
I had a visiting student from Monash U, several years ago, and she was brilliant!
The above responses to your questions are all "valid" and important.
I would also recommend that you step back and look at the big picture before thinking about the analyses.
First, where did your measure come from? Do you have content validity? Did you develop a table of specifications that outlines the content area and processes measured in the items. If this measure is assessing one construct, what theoretical reasons do you have to support the fact that the items measure the construct?
Empirical analyses to support construct validity are great, but you need to demonstrate that you have content validity. How many items do you have and which content areas and processes do they assess in the overall construct?
Here is an example. One can talk about the amorphous concept of depression in terms of cognitive, physical and emotional symptoms. One can also talk about this construct in terms of symptom intensity, frequency and duration. If you create a Table of Specifications for the content and processes you hope to measure, this will be helpful to talk about validity before you do SEM and other analyses. It will help you to make the best use of your empirical analyses.
Depends! If you want validate a questionnaire you must guarantee the tendence on giving social and desirable answers. The validity of a questionnaire for measure aptitudes, opinions, or satisfactions may be influenced by the tendence from people to give desirable and the social anwers to questionnaire items. That tendence is specially strong wnhen the items refer embaracing or intimate issues, or when the truth answer threatens the confidence of the respondente. So, if posible you must avoid items of this type. So you can apply measures of trend in use social desirable answers. You can try to use the Edwards measure (1957) to correct the questionnaire answers or eliminate from the sample the respondents who have stongly that tendence. If you wants to run faster your work you can try to write more neutral items and test them on a preliminar study.
Rasch analysis- provides significant insight into the psychometric properties of the scale, including appropriate use of response categories, measurement precision, how well items fit the underlying trait, unidimensionality, targeting of item difficulty to patients’ ability, and differential item functioning (DIF). Rasch analysis is important for studies using rating scales, as loss of measurement quality due to participants’ poor understanding of questions or underutilisation of response categories can reduce the value of clinical research.
I agree with the above regarding analysis of DIF analysis, but it all depends on how you intend to use your instrument. If it will be used in different languages, then you can use the procedures outlined in Van DeVijver's "Adapting educational and psychological tests for cross-cultural assessment". Also Bruno Zumbo has a new book out on validation of scales, that is more statistically oriented.
I have a resource I'd like to recommend. This is part of a series of statistical and psychometric books that are very reasonable and readable-- and since it's an e-book you could get it immediately. David Garson's "Validity and Reliability". http://www.amazon.com/gp/product/B00BKP6BQ6
All of the answers are very useful, but be careful of two often overlooked issues.
1. The rule of subjects to items in initial scale development should be 10 to 1, so unless you have a highly valid set of items, you are likely to get results that could be hard to replicate. Next, to correctly do the EFA, use common factor analysis, NOT principal components. Then use an oblique rotation (NOT varimax) and examine the pattern matrix to see how the items are associated.
2. Cronbach's alpha is a lower bound estimate of the internal reliability of a UNIDIMENSIONAL scale. If you measure has more than one factor/dimension, alpha is not appropriate.
We have just published an article about scale purification, i.e. the process of eliminating items from multi-item scales. We have used the example of SCM, but our framework can be applied to any other discipline. Download: https://doi.org/10.1108/SCM-07-2016-0230 (or request via my ResearchGate page).