Sometimes it's mentioned that, as a rule of thumb, setting K to the square root of the number of training patterns/samples can lead to better results.
Is there any justification for that term or have you ever seen that in a paper?
Any other straight forward solutions?