You might have a look at the receiver operating characteristic curve. The area under this curve is a measure of the accuracy of the test. The Youden's statistic (specificity + sensitivity - 1) gives you the performance of the diagnostic for a specific delimiter.
When you design a test re-test research, it’s a paired design, just like when you run the same sample by Method A and then by Method B.
In this case you can perform many tests:
• t-test for a paired design (Ho: Mean of differences = 0.0), with limitations by samples of large size.
• Intraclass correlation coefficient (with a confidence interval)
• Lin’s concordance correlation coefficient (with a confidence interval)
• Regression analysis between the two procedures or methods; looking for: Ho: Beta = 1.0, and Ho: alpha = 0.0 ("perfect" regresion line)
• Bland and Altman procedure (it’s like the residual analysis of the regression, in part numerical and graphical)
Never a Pearson’s correlation coefficient must be performed. In the attached papers you can read some reasons, but I think that the most powerful reason to not use this procedure is that with the Pearson’s coefficient you’re looking for a co-relation between two independent variables. When you run a paired (test re-test) design, the same sample gives you two records that are already correlated and looses the independence between variables.
For calculus of Confidence Intervals of intraclass correlation coefficient and concordance correlation coefficient (t-test and Bland and Altman graphics too), you can find MedCalc software that gives you a 15 day free trial (http://www.medcalc.org/).