If we take a paper based survey that is validated and put it online does it need to be validated online as well or does the format in which the questions are delivered matter?
It is not the paper vs online dichotomy that might be the problem. The validation of any instrument is for a particular population and the reliability and validity may or may not hold for another population. Even if the instrument is used exactly online as it was in the original version, you would be best to collect reliability and validity data with that population. That way, if it comes up, you can defend your process.
James E. McLean I respectfully disagree. If an instrument is not valid for a particular population, the instrument is not valid period. An instrument may do a better job of separating participants on the construct of interest in one population relative to another, but this does not imply the measure is "more valid" for the first population. An implication from your comment is that one must always assess reliability and validity every time an instrument is used. It is possible that one's sample comes from a different population than another sample whether you're using the same instrument twice on paper or once on paper and once online.
Dear Blaine Tomkins, please allow me to illustrate with a simple example. In fact, I will use just one item from the first edition of a very well-known intelligence test for children. On the vocabulary test, the child was asked to define "brim." Of course, the test was given verbally on the individually administered intelligence test. A response of "part of a hat" was awarded two points. A response of "a fish" was awarded one point. Other responses were awarded zero points. Children from the North and Midwest were most likely to respond, "part of a hat" getting two points while children from the South were most likely to respond "fish" getting only one point. It easy to see that the item was valid for Northern and Midwestern children, but not for Southern children as a bream is a very common fish in the South and is pronounced exactly the same. There are many examples of this in the literature often featuring different age groups, racial groups, or other populations. While I agree that reliability is a prerequisite to validity, it is also very possible for the reliability of an instrument to vary greatly from population to population.
Hello Jim. Excuse my long delay. I hadn't noticed you replied to my previous comment until just recently.
The situation you describe above is very interesting and has prompted me to think more about measure validity. However, I would not consider this example as reflecting a scale being more valid for one population than another. Rather, I consider this item on the vocabulary test to be a bad item. I'll explain,
When we create items for a new instrument intended to measure some construct (like vocabulary), the validity of the item depends on each item only asking a single question. That is, to avoid double-barreled questions which are a known threat to construct validity.
Incidentally, I've encountered this very situation years ago when conducting vocabulary tests with aphasia patients. The problematic item was 'quay' - an uncommon nautical term pronounced like "key".
As I see it, the vocabulary question asking the child "what is a [brim]?", despite being verbally administered, is actually a double-barreled question. There are multiple questions being asked within this item since there are multiple definitions of [brim]. It is a bad item and a threat to the construct validity of the test. The researchers would do well to omit homophones and/or change the response scale/format.