Every object in my dataset is described as a vector of n=20 features. All the features are integers but they have different scales. I need to choose a measure to evaluate the similarity of two objects in the dataset. I have to satisfy the following condition: Two feature vectors which are identical (i.e., they have exactly the same numbers), must have the same similarity value. I already tried different similarity measure, like dot product and cosine similarity:
Dot product does not work in my case because the similarity measure depends on the specific numbers in the feature vector. For example given these two objects a=[2, 2, 30, 4, 5], b=[2, 2, 30, 4, 5], then similarity(a, b)=949. Given these two vectors c=[2, 2, 300, 4, 5], d=[2, 2, 300, 4, 5], then similarity(c, d)=90049. I want the similarity to be the same number in both cases, i.e., similarity(a, b) = similarity(c, d);Cosine similarity does not work in my case because it only takes into account the angle between the vectors. I also need to take into account magnitude. For example, given these two objects a=[2, 2, 30, 4, 5], b=[4, 4, 60, 8, 10] then similarity(a, b) = 1 (the maximum similarity). Since the numbers in the feature vectors are different, in my case their similarity should be not the maximum.It is seems to me that standardizing the features and using an Euclidean distance, a Manhattan distance or in general a Minkowski distance is the most suitable solution. Can you suggest me other distance measures that are more suitable for my scenario?