Developing a PRO questionnaire requires evidence that it actually does what it 'purports' to do. The process includes item development, cognitive debriefing, and psychometric assessment of reliability, reproducibility, redundancy, and validity. The concepts, the vocabulary, the clustering of items into domains, and the scoring system all require special consideration. However, the clinical validity can really only be tested in patients with differing disease severity, and over a period of time, to consider changes due to the natural history of the disease, or the effects of treatment (both beneficial and adverse).
The two terms are slightly different, but most people do not care much about it. When we develop a standard tool, i.e. a PRO measure in this case, we need to divide our study into two phases: tool development and evaluation. The first phase of tool development is to devise a tool with conceptualization and item generation and test it for three psychometric properties - practicality/usability, validity and reliability; the fourth property 'responsiveness' is not practical to assess at this stage. The psychometric test is therefore the preliminary assessment with the limited number of patients. In Phase 2, the tool evaluation, or sometimes called 'tool validation', is required to fully assess the newly development instrument. This is conducted by comparing it with a widely accepted measure (or gold standard), which may or may not be the same as the one used in the reliability test, in a large patient group. If a gold standard cannot be found, the tool will be completely evaluated alone. The tool responsiveness, together with confirmed validity and reliability, is measurable during this phase. In short, a psychometric property test is actually not equivalent to a validation test.