A new version of a self-assessment scale to measure the security level of a company has been developed, and I would want to test its reliability and validity (compared to the older instrument).
The instrument consists of attributes to be rated on a 4-point scale. Each attribute is grouped into higher levels (which remain the same in old and new versions of the instrument). Data collected: A single rater survey response to both versions (old and new) of the instrument concurrently, thus measuring the same security level.
I would like to know if it's possible (or valid) to deduce that the new instrument is comparable to the old instrument based on these responses. I think I should perform concurrent criterion validity tests (where the old instrument version is the gold standard), which leads me to spearman's correlation, but still, I am quite at a loss on which statistical method is best for this type of analysis.