I have two separate groups which are asked under different conditions to sort a list of items in terms of preference(building a hierarchy). The total number of items is 5 and the data is categorical(nominal).
I want to do a comparative analysis between the two hierarchies and measure how significantly they differ form each other. What analysis should i use?
The first approach I suggested is first of all necessary. Because this enables you compare survey 1 and 2 for each of the item, and then for the aggregated scores. You can now compare all the items at once to see whether the preference differs significantly among them. This is possible using Friedman test for several related variables. But At this level, you have to work directly with the raw data in SPSS not with the summary contingency table. You can first of all run Friedman for the entire data set (the two survey combined), then splitting (data Menu) the grouping variable (survey) as to have Friedman for survey 1 and two separately. As you can see, this is purely mathematics and in social survey, emphasis shall be placed on visual appreciation of trends and that is why designing a contingency table as the one we agreed on is very essential before any calculation of the P-values. You can complement Friedman (comparing more than two related variables) with a Pair comparison test for two independent variable (you can have one under the nor-parametric test group) as to compare within item for survey 1 and 2. The output for Friedman can be compared with that obtained with Chi-Square. You can even appreciate the consistency with which respondents rank the various items using Cronbach Alpha reliability coefficient and the relationship between items (for instance does the preference of A implies preference of B?) using inter-item correlation coefficients; this will be done with the entire data set as well as separately for survey 1 an 2 using the Split file or Select function as to appreciate the difference. These test are complementary and this triangulation approach will deliberately give a good appreciation of the variability of your data if it is properly done. You can go further and collect your data and we will see after that. Regards.
I thought about correlation tests but these won't tell me anything about the two hierarchies and if they are different from each other.
I can use descriptive statistics, basically use a point system to build the hierarchies and then say well this are the results but i want to do it a bit more scientifically if there is a way.
I also though of using kruskall wallis or man whitney but this i can only apply for each item of the hierarchy, but it can't tell me if the overall order of the hierarchy is similar or different from the other group.
I have no idea if there is a test of variance for hierarchies...
John thank you very much for your answer.I think i should have been more clear and i am sorry about that. In my head it sounded understable but i didn't convey my message properly. By my understanding correlations are a test between two variables which i don't think this is the case here.
Put simply, i get two groups of randomly assigned participants to them. (lets say 50 and 50 people).
Then i give them a list (bananas, apples, carrots, strawberrys, oranges). And i say to them rearrange this list based on your preference. So what i have at the end is two groups that returned to me their own hierarchies of these items.
How do i test that the order of these items(hierarchies) is the same for these two groups. I can't think of any proper methodology to assert this.
Since you are interested in whether the two groups are similar or not it is suffice to obtain the rank correlation as a measure of co-movement or concordance and test for its significance. Kendall' tau is valid too.
Dear Colleague,
First, you should use nonparametric method. but please give more information on your fresh data. for example, what response variable is continous or categorical? what explanatory variable(s)? Generally, if response variable is categorical, you can use non-parametric method. If you want to compare two independent groups with categorical response variable , you should use Mann-Whitney U test.
Please feel free to contact me.
Regards
Dr. Ecevit EYDURAN
Recently, I used Partial order Generelization of Rough Set Theory. There are several statistical tools are also available.
I have to agree with Dr. Ecevit EYDURAN, a non-parametric in this case is more adequate. Maybe a Mann-Whitney U test is enough. You could use Kruskal-Wallis too, but with 2 groups is not necessary.
Dear Michael Tsikerdekis,
Firts I agree with Ecevit Eyduran that we have all to be right about your data, with your example of fruit: i) quantitatively defined (numerical) or qualitative (apples, bananas ..) , ii) classified in groups or only ranked , as in your example, by "preferences", iii) also number in each sampel (5 in your case ) for parametric or non parametric test,
iiii) descriptive statitic (comparison of distributions, independence of group or correlation) or more complex in case of "a preference " of subjects in two groups and comparison of two groups, and so on ...
I think that in each case, some statitic methods are accurate, in others they are impossible to be used.
Finally, if a transformation of qualtiative data, to quantitative ones (banane = 1) is possible, and accurate for future analysis, it can be perhaps more useful to analyze; and "a preference" can become a "scoring" (1,2,3 ..) , or as equivalent , the attribution of a "weight" to data, then analysis with this "weight" in tests (ie factorial analysis).
Regards
Didier J
Thank you all for the answers! You've been really helpful. :-)
I think both of the Hamming Index and the rank correlation coefficients are problematic because the data i get from each participant are more complex. Each participant from group A gives me back a list(e.g. [b,a,c,o,t]) which can be compared with the list of each participant from group B(e.g. [a,b,c,o,t]). The way that i see it and please correct me if i am wrong, is that if i want to use the Kendall tau distance, i would have to cross compare each list of participant from one group to each participant's list from the other group and then add all the results into a mean value. Then i would have to do the same for the rest of the participants. Then having a mean Kendall tau distance for each particpant from Group A against Group B and the vice versa i can use a t-test to determine if the kendall tau distance between the two groups is different(and therefore the original rank ordered data). Like John said, Kendall tau distance is a metric and i don't know if the whole approach is scientific.
Another solution is to use as suggested Tatiana multinomial regression analysis, or as suggested by Ecevit Mann-Whitney U provided that i code all possible combinations from the hierachies into categorical data. Problem with this is that i have 5 items that given that can be arranged in every possible order i end up with 5!=120 possible combinations or else 120 different unique categories that represent each possible combination. Chances are i will definately end up with groups being significantly different from each other but i don't think this will be accurate. I complete ignore distances. Lists [a,b,c,o,t] and [a,b,c,t,o] are more similar than [a,b,c,o,t] and [c,o,t,b,a]. In categories each one of these will become a unique number, hence ignoring any "similarity".
To be honest i am more in favor of the first method but can it really work or be revised so that it can become solid?
Didier i should have specified this from the beggining :( .The data is definately categorical that's why i used the example of fruits. The problem is that each response of a participant becomes a set (e.g. [bananas,apples,carrots] while another participants [apples,carrots,bananas]).
With this i have 2 groups with each participant having an answer like the above two examples.
I need to compare if the answers from the two groups are similar or not and how similar.
Dear Sir
You may use the Co-integration test (E-Views software or from SPSS package) for comparative analysis.
For your last request, you must use spearman rank correlation to determine similarity of anwers given for two independent group data with the help of SPSS statistical package program. please click the following link to do spearman rank correlation
http://statistics.laerd.com/spss-tutorials/spearmans-rank-order-correlation-using-spss-statistics.php
Good lucks.
Ecevit EYDURAN, Assist.Prof.
Manuel indeed i think you are right that hamming distance could be a good for testing the similarities. But still the result will be testing an individual's result against another. Question is how can i do this for groups.
As an example imagine that your table looks like this:
PARTICIPANT_ID | GROUP | ANSWER
1 | 1 | (1,2,3,4,5)
2 | 1 | (1,2,3,5,4)
3| 2 | (1,2,3,4,5)
4| 2 | (3,4,5,2,1)
How do i establish the level of similarity between the answers of group 1 and group 2?
Hi! Well, I am more of an ecologist, but anyway... I was wondering if you could have a matrix with 5 variables (bananas, apples, carrots, strawberries, oranges), where each person is a case and you can reverse the order of preference (5 becoming the highest "score" for a variable). Then you begin by calculating a triangular matrix using something like Kendall's rank correlations between all cases. Now you can use something like ANOSIM or PERMANOVA to test for differences between the two groups using the correlation matrix directly (they use permutation tests to calculate a P-value. ANOSIM is strictly non-parametric. PERMANOVA can even use Monte Carlo methods if the number of unique permutations is not large enough). I am not 100% sure this can be done, but I can look at this better later... I just thought of this now and don't have a lot of time on my hands at this moment, sorry :).
Michael, there seem to be two different 'steps' to your analysis; actually, I believe Manuel already addressed both them.
(1) Evaluation of (dis)similarity between your groups is basically a distance measurement problem. Each possible hieararchy can be considered a point in a 5-dimensional (hyper)space, and so each of both tested groups can be represented as a cloud of points in this space. The distance between both groups can be summarised by the distance between the group centroids. The distance between the centroids is of course a summary measure; it is valid, but does not bring information on 'distributional characteristics' of the groups (e.g. overlapping of clouds, their pattern and orientation etc.), which, however, (IMHO) are relevant.
This being said, one can calculate the centroid distance also for both groups in your example above. A word of caution: Hamming distance seems appropriate; nevertheless, it might be reasonable to consider some other as well.
The issue with the inter-centroid distance probably remains that although objective it is still likely to be somewhat difficult to interprete.
(2) The second step would be testing the overall difference between clouds. A bootstraping approach (as suggested by Manuel) seems the way to go, as it will eventually provide a significance estimate of between-groups difference. Here (now just thinking aloud :) ) one could designate one group as the reference one and then evaluate the probability of the other one being drawn from the same population/distribution. (It would probably make less sense to test both group against the cloud of uniform distribution in the hyperspace.)
Best, Jaro
Jaro Lajovic
rho sigma research and statistics
www.rosigma.si
Dear Michael,
Thanks for these precision. In this case, some statistical methods suggested precedently cannot be used.
According to Jaro
"Evaluation of (dis)similarity between your groups is basically a distance measurement problem. Each possible hieararchy can be considered a point in a 5-dimensional (hyper)space, and so each of both tested groups can be represented as a cloud of points in this space. The distance between both groups can be summarised by the distance between the group centroids. The distance between the centroids is of course a summary measure; it is valid, but does not bring information on 'distributional characteristics' of the groups (e.g. overlapping of clouds, their pattern and orientation etc.), which, however, (IMHO) are relevant."
Not being a hyperspecialist, I think that it is that is named "factorial analysis of correspondance " (or only in french and with this literal translation); and softwares make the calculations. Isn't it ? and the distance to a centroid corresponded, for me, to that I named "a weight" for data; the point nearest the centroid is stronger linked than the more far away point; and this notion , always for me, join the point of view of Miguel suggesting the scoring of variables (1,2,3 ..) , that could be considere equivalent to the "weight" of a variable or its distance to a centroid.
Regards
Didier
For your problem, Another alternatives are:
1) Chi-Square and G statistics are used to test an association between two categorical variables, but total sample size should be more than 150-200.
2)Kendall Tau correlation, a non-parametric test, like Spearman correlation can be used if sample size is small.
3) Multiple Correspondence Analysis are used to visualize graphically all the interactions among levels of categorical variables more than 2 if sample size is number of variables x 5 or 10.
4) Power analyses for Chi-Square and G statistics can be used in order to determine required sample size for two categorical variables using SAS statistical package program.
I wish you great success.
Dr. Ecevit EYDURAN
Assist. Prof
You can make a table (group 1 v/s goup 2) in terms of frequencies and percentages. Your data is not sufficient for any statistical testing. Spearman rank correlation coefficient is for ordinal data, not for nominal.
To look the difference between two proportions chi square is appropriate. But your sample size is not adequate. So better make a table as I said, or draw a multiple/ component bar diagram.
Dear friend
Would you like to compare two groups of respondents that ranked given items or you would like to compare two groups of items that a given respondents ranked them? It requires different statistics analysis. Anyway, in SPSS, consider each item as a variable, i.e. in one column. Then enter the number that people has given to the items. Suppose I ranked 5 items: first rank to item4, second rank to item 1, third rank to item 5 and etc. you know, every respondent needs one row in SPSS. After data entry you should decide about your main research question. Are you looking for the correlation among mode of ranking? For example those respondents who have given first rank to item 5, ranked item 3 as 5. In this case, Spearman is one of the best. By this you will be able to analyze direction and strength of relationship as well as significant. If you have two groups of respondents, you will need one variable more. Simply consider one column for “type of respondent” and use code 1 and 2 for them. In this case, I think ANOVA (one way analysis of variance) would be better. It can show the differences between groups. Please pay attention carefully to considerations of applying ANOVA. ALL THE BEST
As known, a good solution is a non-parametric method for categorical data analysis, likert type data. For determining an association between two categorical variables, Chi-Square and G statistics should be used in contingency tables (r x c table). Also, Contingency coefficient (= root(( chi-square/(chi-square+total sample size)) can be calculated using Chi-square.
Kind Regards
For those joining the discussion, i do plan to treat the data as categorical just in case, however with 5!=120 categories of answers i will probably get a significant difference between the groups while totally ignoring any similarity tests between the true answers which are ordered sets or arrays( a single answer is of type [a,b,c,d,e]).
Manuel, John and Jaro described a process which will alow for an analysis of the similarities. I need however some pointers with the overall process. I found that i can use Levenshtein distance along with Hamming distance but as far as i understand probably i will get the same results. Kendall tau distance might be a promising alternative as well as the L_1 distance of ranks (sum of absolute differences). If i find a software that can help me with all the cross-participant comparisons i don't see why i couldn't try all of them and produce the mean difference in similarity between groups. The first part is pretty much straight forward.
Calculating CI via bootstraping is something that i haven't done so if you have a guide for it i would be extremly helpful.
I had some time to check the PAST and the ANOSIM procedure suggested by Miguel. Indeed ANOSIM provides an ANOVA analysis of distances based on an number of distance measures. Hamming is one of them an it used for DNA sequence similarity analysis. Could i use this procedure to compare the two groups?
PS: Manuel, I am having trouble entering sequence data in one field in PAST (1,2,3,4,5). How did you manage to get it to work?
Hi,
You are faced with ordinal variables because Items involved in ranking can be ranked from the most desired (R1 for instance) to the least desired (R5 for instance). We will handle the problem following three simple stages.
Stage 1: Define each item under classification as a single variable in SPSS, PSPP, Stata etc. and assign for each of the item (variable) the rank given to it by each of the respondents. If you have 40 respondents, you are expected to have 40 rows in your data base. And if you have 4 items for instance, you are expected to have 4 columns in your data base, a column standing for a variable or item.
Stage 2: Run frequency analysis for each of the variable (item). Following this, organize the frequency analysis outputs for all the items in a contingency table which model is shown below.
You may have such result
Rank Item 1 Item2 Item3 Item 4
R1 20 (50%) 10 (25%) 10 (25%) 5 (12.5%)
R2 10 10 8 5 4
R3 5 10 8 6 6
R4 2 5 4 4 5
R5 3 (12.5%) 5 (25%) 10 (25%) 20 (50%)
N 40 40 40 40
Visually already, a common reader can appreciate which item is the most preferred and which one the least.
Stage 3: Calculate the P-value (significance value)
You can use Chi-Square test of equality of proportions to compare ranking or preferences between items. Epi-Info 6.04d offers you the possibility to calculate the Chi-Square significant level with such complex contingency table. If you are using the Chi-Square test, you have to work with the values (proportions) in the contingency table.
Otherwise, you can use a test for comparing several related variables. I think it is Friedman (verify). If you are using such test, you have to work with the raw data just as they were entered in the spread sheet.
Set you Alpha (example 0.05 if you are working at the 95% confidence level). If P < alpha, the difference is significant and you can say that items don’t enjoy the same preferences whilst backing you argumentation with their respective weights as presented in the contingency table.
You can call me at (237) 74 54 16 19 or (237) 33 12 14 99
Regards and good luck.
Nana i don't think your solution can be applied to the current problem. You see, what i need to test is not for differences of preferences on items within a group but between groups. My Data has a form such as the one below and i want to test if the answers from Group 1 are similar ot not to Group 2.
Participant ID, Group, Answer
1, 1, [1,2,3,4]
2, 1, [2,3,4,5]
3, 2, [1,2,3,4]
4, 2, [4,3,2,1]
-------------------------
On a second note concerning Hamming Distance, it seems to be also prone to error. consider this two answers:
1,2,3,4
4,3,2,1
1,2,3,4
2,1,4,3
The hamming distance is 4 for both examples, but the second is more similar than the first. Is there any other measure of distance that can balance this?
(I had to repost this using underlines instead of spaces...)
Here I am again. I am not a statistician, so I may be missing some really important underlying issue here, but I will just show you an example:
I tried doing something quickly using ANOSIM and PERMANOVA with this very simple example (of course we will not achieve great power with 4 cases):
Participant ID, Group, Answer
1, 1, [1,2,3,4]
2, 1, [2,1,3,4]
3, 2, [2,1,4,3]
4, 2, [4,3,2,1]
I used Kendall's tau among participants:
|__1___|___2___|__3__|
1|
2|_0.667_|
3|_0.333_|_0.667_|
4|__-1___|_-0.667|_-0.333|
Then, using this correlation matrix I ran an ANOSIM:
Sample group
S1 1
S2 1
S3 2
S4 2
Global Test
Sample statistic (Global R): 0.375
Significance level of sample statistic: 33.3%
Number of permutations: 3 (All possible permutations)
Number of permuted statistics greater than or equal to Global R: 1
Of course very little can be achieved with only 3 possible permutations, so if you don't have a lot of cases, permutations won't help much.
And then using PERMANOVA, which has an option of using a Monte Carlo method to calculate the significance, so the power in PERMANOVA is not that much dependent on the nr of permutations, but more on the number of replicates (on the denominator degrees of freedom). Moreover, PERMANOVA analyses the actual values in the "distance" matrix, while ANOSIM ranks the values in the "distance" matrix first (so it tests relationships among the values and not the values themselves).
PERMANOVA Results:
Data type: Correlation
Selection: All
Resemblance: Kendall rank correlation
Sums of squares type: Type III (partial)
Fixed effects sum to zero for mixed terms
Permutation method: Unrestricted permutation of raw data
Number of permutations: 999
Factors
Name|Type|Levels
group|Fixed|2
PERMANOVA table of results
Source__df___SS_____MS____Pseudo-F___P(perms)___nr unique perms___P(Monte Carlo)
group___1__1.3611__1.3611___2.8824______0.345___________3___________0.246
Res____2__0.94444_0.47222
Total___3___2.3056
Details of the expected mean squares (EMS) for the model
Source EMS
group 1*V(Res) + 2*S(gr)
Res 1*V(Res)
Construction of Pseudo-F ratio(s) from mean squares
Source__Numerator__Denominator__Num.df__Den.df
gr________1*gr_______1*Res_______1_______2
Estimates of components of variation
Source___Estimate___Sq.root
S(group)__0.44444___0.66667
V(Res) 0.47222____0.68718
So this of course would need more cases to be analyzable, but I was just experimenting... :p
So, both these analyses will give you numbers and P-values regarding both groups, using any measure of "difference" among cases you see fit. The thing is I am not a statistician and I don't know if, in your particular case, this can be done (conceptually and theoretically and philosophically speaking).
Miguel what software did you use to perform the ANOSIM and PERMANOVA? I tried using PAST but it doesn't have kendal tau distance. It supports hamming and manhatan distance for ANOSIM but both fall sort in the example that i used. Certain differences they can't detect properly. Kendal tau distance seems better at this but i am not an expert at this.
In general, variance components in any ANOVA model are used to estimate genetic parameters (heritability and repeatability etc) in biological sciences. An ANOVA model can consist of a continous (dependent) variable with discrete variable(s). This is different from your case with no normality. The best choice is a non-parametric method for you. My other suggestion is also that Multiple Correspondence Analysis Technique can be used more effectively to visualize interactions among levels of many discrete variables if you have large sample size. For more information on performing "Multiple Correspondence Analysis", please click the following link: http://www.unt.edu/rss/class/Jon/SPSS_SC/Module9/M9_Correspondence/SPSS_M9_Correspondence1.htm
I hope it will be useful for you.
Ecevit EYDURAN,
Editor of The JIST
Dear Michael,
When you expose your problem as below, and your main purpose : whatever the participants may be, is the "repartition" (as for a a statistic distribution) of a "preference" (1 to n), with an importance of rank, similar between the two groups ? so, it make me think to "the theory of sets" (in french , and I don't know the equivalent term in english) I have learnedduring my mathematical study: the aim at a basic level is to put in relation ie 2 sets and we can define a common part (intersection), some relation between groups (unilateral, bilateral).
In your case and this representation, two groups are different , and "preferences" of each participant different one from another according to the size of the commun surface between the groups (number of characteristic found common, and the link between each member of a group with all the others of the second group ...)
Perhaps some specialists of this sort of mathematic could study your problem. (see representation attached)
Best regards
Didier
Both ANOSIM and PERMANOVA are non-parametric, however, they may be influenced if there are significant differences in multivariate dispersion (which you can test using a thing called PERMDISP) I used PRIMER-E v6 with PERMANOVA package (http://www.primer-e.com/), which is really great and ecology-oriented but you have to pay for a license. However, a specific FORTRAN programs to run PERMANOVA and PERMDISP can be found on Marti Anderson's website: http://www.stat.auckland.ac.nz/~mja/Programs.htm
However, I think you can perform both ANOSIM and PERMANOVA in R (package vegan): http://perceval.bio.nau.edu/downloads/igert/IntroR-Course_Notes/R-Course_Day3.pdf
One of the things about ecological data (which is the only data I am used to) is that assumptions of normality or homoscedasticity are rarely met, so the most recent multivariate methods available are non-parametric, such as these (and allow the use of any distance or dissimilarity measure you think best illustrates your hypothesis). Anyway, please read the fundamentals of these analyses to see if you can apply them to your data (you can even try to contact Marti herself).
Anyway, I am just an ecologist, so please give more weight to what statisticians tell you. :)
The easiest but erroneous way of doing it maybe to sum responses within group and perform a between group comparison. This will be erroneous in the sense that a respondent who scores 2251 will have the same sum of scores with the one who will score 4411 whilst the 2 perceptions or appreciations of the situation are quite different.
SPSS provides the solution to your problem. By using Multiple Response Analysis, SPSS will count and aggregate for each of the group the number of occurrences for all the categories (possible responses). The very group of test offers you the possibility to crosstab Multiple Responses Sets; you can now crosstab categories' scores with the grouping variable (the variable under which the various groups_two in your case_ are labeled) to distribute the MRS scores to the two group. Chi-Square test can now help you appreciate the difference statistically. Lear more about Multiple Responses Analysis in SPSS and you can then better appreciate what I am trying to explain. MRA is a simple counting technique but the most accurate tool to analyze and compare multiple responses to a set of variables.
Regards.
Ecevit, i definitely agree that non-parametric is the way to go with my data. I tried the MCA example but i have no idea how to add my data so that it will make sense in SPSS. In the example family income and class standing is per individual. My variable is family income, and then i have a another variable(if we perceive it as a set) or 5 variables if each part of the set is considered seperate. The problem with the later is that this is not exactly what the participants answered. Consider this two surveys:
1. Please rearrange the list of ABCDE in any order you think is preferable
2. i. Rate A on scale from 1 - 5 based on your preference
ii. Rate B on scale from 1 - 5 based on your preference
iii. Rate C on scale from 1 - 5 based on your preference
...
In the first survey each letter gets a unique value which another letter cannot while in the second survey if someone wants he or she can rate all letter as 5 or any number. Treating data as standalone variables ignores the fact that participants had no choice of using the same rating(1-5) once they used it in another letter. I hope this make sense.
-----
Didier, the theory of sets looks extremly promising at least visually but i am going to have to find a procedure for a statistical package and see if i can apply it to my problem.
-----
Nana, i created a set on SPSS (small example). The MRA also will force me to make the same assumption i describe above but i'm having trouble understanding the results. The table in SPSS looks like this.
Group, A, B, C, D
1.0 1.0 2.0 3.0 4.0
2.0 2.0 1.0 4.0 3.0
1.0 1.0 2.0 3.0 4.0
2.0 1.0 2.0 4.0 3.0
I grouped ABCD using analyze > multiple response > define variable sets and set them as categorical with a range from 1-4. See the file that i attached for results.
I believe the problem is that MRA is being designed for questionaire where individuals can give multiple responses but are not forced to do so for each case. In my case i ask them to reorder the list they always have to return a number of preference for each one of the 5 variables(4 in this example) and that number can only be used once for one variable only.
---
Please let me know if there is something that i misunderstood.
Michael, this is a typical example of analysis for ranking responses. The process is I explained in my first intervention still hold here. The error we often commit is to believe that it is all type of data set that can be handled globally; No. Some statistical analyses are more structural and this is often the case with Ranking responses. Following your explanation, your design is Ok to me, but how do we analyze the data?
First of all, handle the two surveys separately.
In SPSS, define Income as a grouping variable, and then a variable for each of indicator subjected to ranking. I will suggest that you categorize income to shift from scale to ordinal; for instance 1 (5-10000), 2 (10001-20000), 3 (20001-40000). Categorized base on the local socio-economic indicator, for instance the minimum salary recommended by the law. A first group maybe those who fall below this line and other groups can be defined subsequently.
Lets assume that Nana is a respondent and my income is 10000, I rank Indicator A ‘1’, indicator B ‘2’, Indicator C ‘4’, indicator D ‘4’ and Indicator E ‘5’ for the first survey and then I rank Indicator A ‘2’, indicator B ‘2’, Indicator C ‘4’, indicator D ‘4’ and Indicator E ‘5’ for the second survey.
My variables’ name are as follows: Survey for the two surveys (tow categories for this variable, 1 for the first survey and 2 for the second), Incom for Income, IdA for indicator A, IdB for indicator B, IdC for indicator C, IdD for indicator D and IdE for indicatorE.
This is how my information will look like in the data base:
Survey Incom IdA IdB Idc IdD IdE
1 1 1 2 4 4 5
2 2 1 2 4 4 5
Do the same for all the respondents.
First of all forget about income and run simple crosstabulation using Survey as the independent variable (push in Row) and the five Indicators for ranking as dependent variable (Push in Column). Count within Row.
SPSS will generate five crosstabs (tables).
Design a contingency table with survey and the five indicators having a column in the table. Organize the results of the five crosstabs generated by SPSS in this your own table just by filling information (frequency and percent) in the corresponding cells and appreciate it. The table in fact will have 3 rows (the first one for the variable, the second on for the first survey and the third one for the second survey). Even without running any significance test, you can already appreciate the trend of responses. If you earlier asked SPSS to calculate Chi-Square test, you can freely appreciate the level of difference between the first and the second survey for each of the five indicators. You can now also calculate a chi-Square test comparing all the five indicators between the two surveys at once. This can be done using SPSS but you should know how to go about it. With Epi-Info 6.04d, it more straight forward.
You will be more confident if you succeed this first phase. If this done, integrate income into the analysis as follows.
Crosstab survey and the five indicators layered by income (that is pushing income into the third receiver row of the crosstab window) and run the crosstabs.
Redesign your contingency table but now putting income in the first column and dividing each income row by two (first row for the first survey and the second row for the second survey) and fill the cells systematically using the corresponding information from the SPSS table. SPSS has equally generated Chi-Square significant values if you did not omit ticking this option.
Appreciate your table, comment it and draw your conclusion.
Social science analysis, unlike experimental analysis is much more complex because beside mathematical calculations, it requires a lot of structural analysis and organization. Also the way results shall be presented shall allow readers to freely appreciate the distribution of weights between categories. The comparison first all shall first of all be visual before been complemented with mathematical calculations of P-values.
Regards.
Nana, i made all the calculations up to the contigency table but still it doesn't make sense. Maybe i am doing something wrong.
Okay so i build a dummy table where IdAIdBIdCIdD and none of this can be 0 and they are always within {1,4}. I skipped income cuz i just compare the two surveys .
Survey, IdA, IdB, IdC, IdD
1, 1, 2, 4, 3
1, 2, 1, 3, 4
2, 3, 4, 2, 1
2, 4, 3, 1, 2
I run a cross tabs analysis and the results are in my pdf that i attached to this message. Based on this the contigency table will look like this:
Survey | IdA, IdB, IdC,IdD
Survey 1 | 2, 2, 2, 2
Survey 2 | 2, 2, 2, 2
I may be wrong but the contigency table will always have equal numbers of frequencies for each survey because each participant always provides me with an answer for each variable.
I could create for each survey a contigency table based on the answers which would look like this(each row represents a value(1-4)):
Survey 1
I , IdA , IdB, IdC, IdD
1, 1, 1, 0, 0
2, 1, 1, 0, 0
3, 0, 0, 1, 1
4, 0, 0, 1, 1
Survey 2
I , IdA , IdB, IdC, IdD
1, 0, 0, 1, 1
2, 0, 0, 1, 1
3, 1, 1, 0, 0
4, 1, 1, 0, 0
Looking at this i can definately say that the hierarchies (A-D) where participants put in an order differ between the two surveys. The challenge is from this point on how do i compare these two contigency tables.
Your crosstabs is semantically very Ok, indicating that you can define variables and run some statistical tests in SPSS. Congrats. Also, the summary tables for survey 1 and 2 are ok for me. From your crosstabs, I could build a sample of compiled contingency table for you and calculate the difference level between the two surveys; but my problem is your sample size, just the sample size for now. How many people were effectively surveyed? Any way, this is how your compiled contingency table will look like, and using Chi-Square test, compare survey 1 and 2 for each of the ID and for the aggregate (the total scores for each of the rank, for instance the total number of those who ranked 1 for Sur1 and 2 respectively. Do the same for Ranks 2, 3, 4 and 5)
Rank IDA IDB IDC IDD IDE Aggregate
SurV 1 SurV 2 SurV 1 SurV 2 SurV 1 SurV 2 SurV 1 SurV 2 SurV 1 SurV 2 Sur1 Surv2
1
2
3
4
5
Chi-Square test
Regards.
My table is scattered. However this simple to explain. Rank stands as a column, each ID and the aggregate as a column each. Each of the columns for the IDs and Aggregate is split into two sub-columns (Surv 1 and Surv 2).
The sample size is not determined yet because i want to make sure that the survey is designed in such a way that i can analyze properly the data afterwards. I could get even 150 participants to complete the survey(maybe more considering that it will be a volunteer sample).
Nana thank you for the detailed answer. I now understand your analysis. Basically i isolate each item as if it was a single variable between two groups and then perform non-parametric tests to see the difference between each variable for the two groups. Indeed that is fairly straightforward analysis. Question is, does it apply in my case and will it stand in peer-review.
As i wrote previously the survey can be designed in two ways:
1. Please rearrange the list of ABCDE in any order you think is preferable
2. i. Rate A on scale from 1 - 5 based on your preference
ii. Rate B on scale from 1 - 5 based on your preference
iii. Rate C on scale from 1 - 5 based on your preference
...
The difference here is that the first case imposes a restriction to participants that ABCDE. This restriction is desirable for me because i want them to set an order based on preference.
So is the seperate analysis of each variable valid for my case? Could it stand to reason?
The first approach I suggested is first of all necessary. Because this enables you compare survey 1 and 2 for each of the item, and then for the aggregated scores. You can now compare all the items at once to see whether the preference differs significantly among them. This is possible using Friedman test for several related variables. But At this level, you have to work directly with the raw data in SPSS not with the summary contingency table. You can first of all run Friedman for the entire data set (the two survey combined), then splitting (data Menu) the grouping variable (survey) as to have Friedman for survey 1 and two separately. As you can see, this is purely mathematics and in social survey, emphasis shall be placed on visual appreciation of trends and that is why designing a contingency table as the one we agreed on is very essential before any calculation of the P-values. You can complement Friedman (comparing more than two related variables) with a Pair comparison test for two independent variable (you can have one under the nor-parametric test group) as to compare within item for survey 1 and 2. The output for Friedman can be compared with that obtained with Chi-Square. You can even appreciate the consistency with which respondents rank the various items using Cronbach Alpha reliability coefficient and the relationship between items (for instance does the preference of A implies preference of B?) using inter-item correlation coefficients; this will be done with the entire data set as well as separately for survey 1 an 2 using the Split file or Select function as to appreciate the difference. These test are complementary and this triangulation approach will deliberately give a good appreciation of the variability of your data if it is properly done. You can go further and collect your data and we will see after that. Regards.
I had some time to test all the above so i created a dummy table and filled it with random cases.
The contigency table indeed is visually extremely helpful especially with graphs. For the chi-square though i am not sure if i can perform it since 100% of the cells have expected count less than 5. However, it doesn't matter because i can conduct analysis of variance on the actual data.
For variable test within the same survey, i do expect friedman and wilcoxon matched pair tests to show statistically significant differences because the survey is designed in such a way (ABCDE).
For comparing the two surveys and each one of their 5 variable i do think that a mann-whitney test(for pairs) is appropriate since the samples(surveys) are independent. This will allow me to test for differences between each survey for each variable.
However, in order to be able to predict if the independent variable(the two surveys) affect all of my 5 dependent variables i would need an equivalent of a non-parametric MANOVA test. PERMANOVA or NPMANOVA seems to be able to do that. The paper that is referenced for npmanova for past is here for anyone interested: http://stg-entsoc.bivings.com/PDF/MUVE/6_NewMethod_MANOVA1_2.pdf
I am going to have to read it in detail and see if i can use it in my case. In any case, based on all of your advice, even without npmanova i think with the contigency table, friedman and mann-whitney tests i can sufficiently display if the two surveys produced similar hierarchies.
Congrats to the Research Gate Mathematics and Applied Statistics Link for their high sense of selflessness and sharing.
I am not sure I understood you completely, but if you want to measure the difference in agreement between different groups about their preferences for the list of items, you can use cohen's kappa and Fleiss' Kappa or Krippendorf 's alpha. These links may help you understand the details.
http://stats.stackexchange.com/questions/132609/comparing-inter-rater-agreement-between-classes-of-raters
http://www.real-statistics.com/reliability/