In K-mean clustering, every data point is being clustered. The data points which are supposed to be treated as noise are also considered in clusters. Is there any possible way to reduce such noise or a hybrid of K-mean clustering?
I guess that it depends in which algorithm you use to get the detections, before using K. Or you can use another algorithm to filter positions applying different parameters like "Radius of neighborhood" or "N° positions required in neighborhood", and then start using Ripley. You can also use neighborhood density function that is a non cumulative way of study.
But always the way to reduce noise depends on the technique that you are using to get the data and how you treat the sample.
I´m currently working with STORM that is super resolution microscopy technique, to do overcounting and get thousands of artifacts is a really common mistake so we have to be very carefully about this. There are hundred of papers talking of clustering related to microscopy.
I will try to incorporate these suggestions and also look into neighborhood density function. I work with protein-protein interactions so I have to find the best binding pose to predict the binding orientation through statistical mathematics.
I have tried neighborhood density function, it worked fine with little optimization of with maximum neighborhood distance(MND). But the problem is I have analyzed the data manually and selected the the MND. Is there any way I can generate the optimum MND computationally by some algorithm. Because the dataset has 2000 points in it and finding it everytime manually cost too much effort.
I'm not sure what you mean by "data points which are supposed to be treated as noise" - did you create them, or do they just appear to be noisy ?
data that doesn't fit nicely in to any class may be noise; or may be a new / previously undiscovered class ...
take a look at http://ti.arc.nasa.gov/tech/rse/synthesis-projects-applications/autoclass/references/ ; and software available for down load at http://ti.arc.nasa.gov/tech/rse/synthesis-projects-applications/autoclass/
I'll describe my understanding of AutoClass briefly as it applies to your question, but go to the above sources :) AutoClass implements a hybrid K-means with individual observations assigned probabilities of being in each cluster. The number of clusters is automatically determined; and, the sense of distance is automatically determined (each cluster has its own measure of distance, based on mean and standard deviation of the observations assigned to the cluster).
I appreciate that this thread is 4 years old, but I stumbled upon it when I encountered the same issue. I solved it by using a density based spacial clustering method. I thought I would add this just in case anyone encounters a similar problem and ends up here!