Suppose we are doing a Pre-Post Study and while going for Paired Sample Correlations say for example in Paired t-test. What does these values really mean?
Do both paired t-test and paired means analysis. These two tests, although they are not the same, they can be combined to check-recheck for inferential errors: Type 1 & Type 2. Procedures follows:
PRE-POST
Assume that a sample size n1 was taken at the beginning of the study at time t1. Subsequently, a second sample with size n2 was taken at time t2. The query is what is the meaning of the correlations between these two samples? Generally, this is not done because n1/t1 = X1 and n2/t2 = X2 are both independent variables. However, it is possible to run a correlation for X1 and X2 since the two samples may not be an ordinary XY-Cartesian plane data points, but XYZ-Euclidean data points where Z = time that has been forced into a 2D picture in a form of time series analysis.
TEST STATISTIC
The test used is t-test because we are dealing with sample analysis. Recall that t-test is used for sample, Z-test is used for population. In a pre-post analysis, there are 3 steps in the testing:
(1) t = (x^ - mu) / (S / sqrt n) at time t1
(2) t = (x^ - mu) / (S / sqrt. n) at time t2
If there is any change between the Pre- and Post, the t-critical at t1 and t2 would be different. This is paired means analysis. The next step is to answer the query: Is this difference significant? Assumed that the confidence is 95%, the t-critical for both tails is 1.64 at infinity sample size (for the sake of this discussion), the pairs study of the two periods's t-equation follows:
The notation (*) is used to differentiate t-critical from those in equations (1) and (2).
The following rejection criterion is used:
Ho: t* < t(1.64) @ 0.95 confidence interval
H1: t* > t(1.64) @ 0.95 confidence interval
Decision rule is thus phrased: "Reject Ho if t* > 1.64 @ 0.95 proving that the difference of the t-pre and t-post is statistically significant."
However, if t* < 1.64, its means that the stimulus is not effective, i.e. the policy or treatment failed.
CORRELATION
In case of testing the correlation, the t-equation for correlation is:
(4) tr = [r (sqrt n -2)] / sqrt. (1 - r^2)
where r is the Pearson correlation which is given by;
(5) r = b(Sx) / Sy
where ...
b = slope of the regression line
Sy = standard deviation of Y at t1 (do the same for t2)
Sx = standard deviation of x at t1 (do the same for t2)
One can compare tr at t1 and tr at t2. A bootstrap process may help in order to generate many samples.
Equation (4) may be used for X1:Y1 at t1 and X2:Y2 at t2 and compare. As for control-treatment of pre-post studies, follow steps in equations (1) - (3) and (6).
PRE-POST CORRELATION (?): Yes, possible.
Can the result of t1 and t2 be correlated to one another? that is, Y at t1 and Y at t2, called it series of Y1: {y1, y2, ...,yn} and Y2:{y(2)1, y(2)2, ...,y(2)n}. Now make these two series in a form of X and Y, say Y1 = X and Y2 = Y and run regression. Now do the same thing, but reverse the series: Y1 = Y and Y2 = X then run regression. One should have an opposite answer if the slope is not zero. What does it tell us? Ultimately the same thing as the steps outlined above, but in a different manner: confirming changes in result of treatment at PRE and POST. Here, we will have r created by the changes (if any) between PRE and POST treatment. it is a shorter step involving equation (4) and (5), but double check it twice by flipping X:Y at t1 and t2. Under this approach, the test hypothesis is the slope b, i.e. Ho: b = 0 and Ha: otherwise.
INFERENTIAL ERROR SAFEGUARD
Type-1 Error occurs when the researcher insist on being correct and wrongly rejects the null hypothesis. In this case, the researcher may insists that t* is statistically significant because t* > 1.64. We need to double check that we have not been stubborn or that the data or calculation had not been misread. Since t* is a result of pairs means studies in a form of comparison, the way to double check against Type I error is to confirm it with paired means difference analysis: d-bar analysis. Note d-bar has its own t-equation;
(6) t(d-bar) = d-bar / (S / sqrt n)
Type-2 error occurs when the researcher right from wrong and thus rejecting the alternative hypothesis despite the evidence supporting his/her position. In this case, t* > 1.64 but was rejected as the changes between Pre-treatment and Post-treatment as insignificant, i.e. beta probability is greater than zero. A safeguard against this type of inferential error is to introduce the following experimental design. Sum: (n1 + n2) = n* where n*: {x1, x2, ..., n*} and the time difference is disregarded. The two samples are treated as one sample group: n*. Use the basic t-test:
t = (x^ - mu) / (S / sqrt n*) with the following rejection criterion:
Ho: t(n*) < t(1.64) @ 0.95 confidence interval
H1: t(n*) > t(1.64) @ 0.95 confidence interval
This experimental design asserts a presupposition that sample group n* is consistent (homogeneous), the null hypothesis should assert that it is not homogeneous. Conduct a second set of b-bar analysis to confirm whether there is homogeneity in the sample group. If homogeneity exists, it means that the stimulus or treatment is not effective.
MEANING & INTERPRETATION
The context in which this type of test is used is in a case of introducing a stimulus into the system (population). The first sample n1 taken at t1 may be classified as a control group because it is taken pre-stimulation (prior to introducing a treatment, i.e. policy or new pharmaceutical) and the sample n2 taken at t2 is called treatment group.
APPLICATION
Since the data collection is time relevant or sensitive, the researcher might want to track changes over time, i.e. consumption pattern of a certain product from t1 to t2 or in some cases, it is used to track changes of opinions before and after a certain stimulus (say government policy: adjusting interest rate and the response of investors or exporters before and after the introduction of the change in interest rate---which influences exchange rate---and in turn influences the volume of export) or even in cancer treatment, the test will verify the treatment regime's efficacy. Pre-Post correlations test allows the researcher to test the EFFECTIVENESS of a certain stimulus (i.e. policy tool). It answers the changes has there been a change? If so, is that change significant?
Paired t-test is different with correlation. Paired t-test, tests whether the mean of the sum of differences of each pair are equal to zero (H0). But correlation tells us whether there is any relationship between two groups.
Do both paired t-test and paired means analysis. These two tests, although they are not the same, they can be combined to check-recheck for inferential errors: Type 1 & Type 2. Procedures follows:
PRE-POST
Assume that a sample size n1 was taken at the beginning of the study at time t1. Subsequently, a second sample with size n2 was taken at time t2. The query is what is the meaning of the correlations between these two samples? Generally, this is not done because n1/t1 = X1 and n2/t2 = X2 are both independent variables. However, it is possible to run a correlation for X1 and X2 since the two samples may not be an ordinary XY-Cartesian plane data points, but XYZ-Euclidean data points where Z = time that has been forced into a 2D picture in a form of time series analysis.
TEST STATISTIC
The test used is t-test because we are dealing with sample analysis. Recall that t-test is used for sample, Z-test is used for population. In a pre-post analysis, there are 3 steps in the testing:
(1) t = (x^ - mu) / (S / sqrt n) at time t1
(2) t = (x^ - mu) / (S / sqrt. n) at time t2
If there is any change between the Pre- and Post, the t-critical at t1 and t2 would be different. This is paired means analysis. The next step is to answer the query: Is this difference significant? Assumed that the confidence is 95%, the t-critical for both tails is 1.64 at infinity sample size (for the sake of this discussion), the pairs study of the two periods's t-equation follows:
The notation (*) is used to differentiate t-critical from those in equations (1) and (2).
The following rejection criterion is used:
Ho: t* < t(1.64) @ 0.95 confidence interval
H1: t* > t(1.64) @ 0.95 confidence interval
Decision rule is thus phrased: "Reject Ho if t* > 1.64 @ 0.95 proving that the difference of the t-pre and t-post is statistically significant."
However, if t* < 1.64, its means that the stimulus is not effective, i.e. the policy or treatment failed.
CORRELATION
In case of testing the correlation, the t-equation for correlation is:
(4) tr = [r (sqrt n -2)] / sqrt. (1 - r^2)
where r is the Pearson correlation which is given by;
(5) r = b(Sx) / Sy
where ...
b = slope of the regression line
Sy = standard deviation of Y at t1 (do the same for t2)
Sx = standard deviation of x at t1 (do the same for t2)
One can compare tr at t1 and tr at t2. A bootstrap process may help in order to generate many samples.
Equation (4) may be used for X1:Y1 at t1 and X2:Y2 at t2 and compare. As for control-treatment of pre-post studies, follow steps in equations (1) - (3) and (6).
PRE-POST CORRELATION (?): Yes, possible.
Can the result of t1 and t2 be correlated to one another? that is, Y at t1 and Y at t2, called it series of Y1: {y1, y2, ...,yn} and Y2:{y(2)1, y(2)2, ...,y(2)n}. Now make these two series in a form of X and Y, say Y1 = X and Y2 = Y and run regression. Now do the same thing, but reverse the series: Y1 = Y and Y2 = X then run regression. One should have an opposite answer if the slope is not zero. What does it tell us? Ultimately the same thing as the steps outlined above, but in a different manner: confirming changes in result of treatment at PRE and POST. Here, we will have r created by the changes (if any) between PRE and POST treatment. it is a shorter step involving equation (4) and (5), but double check it twice by flipping X:Y at t1 and t2. Under this approach, the test hypothesis is the slope b, i.e. Ho: b = 0 and Ha: otherwise.
INFERENTIAL ERROR SAFEGUARD
Type-1 Error occurs when the researcher insist on being correct and wrongly rejects the null hypothesis. In this case, the researcher may insists that t* is statistically significant because t* > 1.64. We need to double check that we have not been stubborn or that the data or calculation had not been misread. Since t* is a result of pairs means studies in a form of comparison, the way to double check against Type I error is to confirm it with paired means difference analysis: d-bar analysis. Note d-bar has its own t-equation;
(6) t(d-bar) = d-bar / (S / sqrt n)
Type-2 error occurs when the researcher right from wrong and thus rejecting the alternative hypothesis despite the evidence supporting his/her position. In this case, t* > 1.64 but was rejected as the changes between Pre-treatment and Post-treatment as insignificant, i.e. beta probability is greater than zero. A safeguard against this type of inferential error is to introduce the following experimental design. Sum: (n1 + n2) = n* where n*: {x1, x2, ..., n*} and the time difference is disregarded. The two samples are treated as one sample group: n*. Use the basic t-test:
t = (x^ - mu) / (S / sqrt n*) with the following rejection criterion:
Ho: t(n*) < t(1.64) @ 0.95 confidence interval
H1: t(n*) > t(1.64) @ 0.95 confidence interval
This experimental design asserts a presupposition that sample group n* is consistent (homogeneous), the null hypothesis should assert that it is not homogeneous. Conduct a second set of b-bar analysis to confirm whether there is homogeneity in the sample group. If homogeneity exists, it means that the stimulus or treatment is not effective.
MEANING & INTERPRETATION
The context in which this type of test is used is in a case of introducing a stimulus into the system (population). The first sample n1 taken at t1 may be classified as a control group because it is taken pre-stimulation (prior to introducing a treatment, i.e. policy or new pharmaceutical) and the sample n2 taken at t2 is called treatment group.
APPLICATION
Since the data collection is time relevant or sensitive, the researcher might want to track changes over time, i.e. consumption pattern of a certain product from t1 to t2 or in some cases, it is used to track changes of opinions before and after a certain stimulus (say government policy: adjusting interest rate and the response of investors or exporters before and after the introduction of the change in interest rate---which influences exchange rate---and in turn influences the volume of export) or even in cancer treatment, the test will verify the treatment regime's efficacy. Pre-Post correlations test allows the researcher to test the EFFECTIVENESS of a certain stimulus (i.e. policy tool). It answers the changes has there been a change? If so, is that change significant?
Usually, you would prefer a high degree of correlation between the two sets of scores. A person who had low scores before the treatment should still have a fairly low scores relative to the others after the treatment, even if everyone improved, and vice versa. This assures a consistent pattern in your data.