I suppose that weight by 1/distance that I can find in the IBk algorithm of Weka tool in computed by the formula 1/(distance)^2, but I'm not sure.

Can you help me to be sure of the presence of (^2)?

Moreover, a more difficult question for you:

If I choose the set S as training set and also as test set, and k = 1, obviously all instances are correctly labeled (d (x, x) = 0). If I increase k (for example k = 9) and choose the weight = 1/d, then all the instances are always labeled correctly, because the weight associated with the nearest point tends to infinity. But why if I increase k a lot (for example k = 299), I get a lot of labeling errors? I suppose 1/0 is considered very big, but how big? I add that in the dataset there are no duplicate instances. Also, I am aware that there can be many points very close to x with the opposite label to the label of x. But, in your opinion, shouldn't the label of x always prevail, since its weight is "infinitely" greater than the other weights? Is it possible that somewhere in the algorithm (for example, for the computation of Euclidean distances or in the search technique (nearestNeighbourSearchAlgorithm = LinearNNSerach)) there is some approximation that is too strong or limiting?

Thank you very much

Gaetano

Similar questions and discussions