Imagine we have a special and fixed similarity function to calculate pairwise similarities between each pair of data points. We have sparse data. so the similarity matrix would be sparse as well.

The goal is to generate a similarity matrix with less than O(n^2) where n is the number of data points. In other words, we are going to reduce the number of similarity function calling for all data points (which is n^2).

An easy way is to do data clustering and then we have a smaller sim matrix based on cluster centers, instead of data points.

I am looking for a solution for this problem considering data sparsity. Also I am looking for the best data clustering method which considers sparsity property for this problem.

More Ali Mousavi's questions See All
Similar questions and discussions