Sometimes it's mentioned that, as a rule of thumb, setting K to the square root of the number of training patterns/samples can lead to better results.

Is there any justification for that term or have you ever seen that in a paper?

Any other straight forward solutions?

Similar questions and discussions