I've been tackling the clustering algorithm called affinity propagation and feel pretty confident of its results. If you want to get an overall idea of a set of data, you use Affinity Propagation which will return a set of exemplars and the data points associated with each exemplar. The exemplars are the subset of data that best describes the similarity matrix of the entire data set. For example if you give it 1000 images and construct a similarity matrix based on hue, it will return the subset of images that best describe the hue of ALL the pictures.
The problem with AP is that it's clustering results can be shaky. the problem lies in the similarity matrix itself, which included the diagonal values in this matrix which AP calls the preference values. These values can make or break the clustering algorithm. Have the preference value too low, then you have too many clusters. Have it too high and you have too few clusters. The trick is finding the happy medium which the developers say is any number between the lowest and highest similarity value, often times the median value. You can adjust AP between these two values until you get something you think represents the result you are looking for.
I want to "adjust" these preference values. Currently the algorithm has all preference values the same. But what if i could preference certain values to be a different value? Doing this gives bias to two images being clustered together, or at the very least biases a certain image to be an exemplar.
What I need is an algorithm that would provide these values, or biases. My first inclination is to look towards neural networks. The reason for this is that ANN does a good job at image categorization. What if I used ANN to train on data in order to get values I can use to "bias" my similarity matrix and preference values.?
In this discussion I want to get some insights that others might have on this idea and to make sure that I have two ideas that are compatible. The idea for a project is to see if I can enhance AP using ANN to give me a better set of preference values. Even further it would be interesting to see if ANN can give me a better similarity matrix.
Thanks for your participation.