t-Distributed Stochastic Neighbor Embedding -- It seems the method reduces dimensions but insights might help. If I run t-SNE hundred times,why should I select the solution with the lowest KL divergence? Is there a theoretical guarantee?
# Machine Learning
# Data Visualization