Hello, I need some help on what statistical test I should use for my data analysis. I have a Data set A, which is an array of 5000 numbers, all of which are zero, and a Data set B, which is an array of random continuous numbers that do not necessarily follow a normal distribution. Both data sets can be plotted on a histogram for visual aid. Data set A is my "ideal" and Data set B is my "measured" - I would like to compare the similarity of Data set B to Data set A (ideally it would be a single output figure such as a % similarity). I would then go on to test another Data set C (the same style of array as data set B - it does not have normal distribution and is continuous numbers) and compare its similarity % with Data set A. I would then be able to make a "ranking" on whether Data set B or Data set C was most similar to Data set A. Some of the considerations: - The similarity value has to account for the shape (ie. the histogram of Data set B will rank with a higher similarity % the closer it is to a vertical straight line as shown in Data set A) - The similarity value has to account for x axis distance on the histogram (ie. the further from zero the poorer the % similarity to data set A) - The weighting of each has to be equal (ie. neither the shape or distance on the x axis is more important) - Because the weighting is equal, if data set B was a straight line at -5, it should have the same % similarity to data set A if it had been a straight line at +5. - the order of the values in array B does not matter

I'm essentially trying to rank data sets against the "ideal" data set A (but taking into account non normal distribution, histogram shape similarity, distance etc). I have no idea what test to apply that can give me a % similarity to the ideal under these conditions.

Thank you so so so much.

More Eleanor Forsyth's questions See All
Similar questions and discussions