Can someone help me understand what are k-means and r-means, and their importance in data mining ?

K means is a clustering method.

When you apply a clustering method to your dataset, it allows you to separate your data in groups that maximize the similarity between data in the same group and maximise the dissimilarity between data in different groups. The number of groups is an input parameter of the problem, that is you will choose it.

K-means groups data, and returns k centroids, i.e. k vectors that represent the center points of the groups, and returns a matrix that assigns each sample in your dataset to a group.

K-means is an hard method, i.e. it assigns one class to each sample.

There is a soft implementation of k-means, named Fuzzy C-means, that allows you to assign each sample to different groups with a membership value.

Clustering methods belong to Unsupervised learning theory, because they allow to find hidden connections in the data, discovering knowledge.

For this reason, this methods are very relevant in the data mining process, where we want to "mine" knowledge from data.

Unsupervised problems are tricky, since the informations that we want to extract from data are not known a-priori, the evaluation of the results is not as simple as in the classification tasks.

I'm not expert of businness data, but suppose that your dataset represents information about customers of a shop. So k-means can group the customers according to a similarity measure.

hope that the main concepts are clear now!:)

Gabriella Casalino

K means is a clustering method.

K-means groups data, and returns k centroids, i.e. k vectors that represent the center points of the groups, and returns a matrix that assigns each sample in your dataset to a group.

K-means is an hard method, i.e. it assigns one class to each sample.

There is a soft implementation of k-means, named Fuzzy C-means, that allows you to assign each sample to different groups with a membership value.

Clustering methods belong to Unsupervised learning theory, because they allow to find hidden connections in the data, discovering knowledge.

For this reason, this methods are very relevant in the data mining process, where we want to "mine" knowledge from data.

Unsupervised problems are tricky, since the informations that we want to extract from data are not known a-priori, the evaluation of the results is not as simple as in the classification tasks.

I'm not expert of businness data, but suppose that your dataset represents information about customers of a shop. So k-means can group the customers according to a similarity measure.

hope that the main concepts are clear now!:)

Nenad Tomašev

K-means is a simple method that is often used to partition (cluster) the data in an unsupervised way. The centers move around over many iterations until they converge to a local optimum. Different initializations can lead to different optima (depending on the basins of attraction) - so the clustering is usually re-run several times and the best result is taken. Alternatively, more care can be taken while initializing the centroids - look for example at the K-means++

K-means can be easily kernelized and Kernel K-means allows for detecting clusters that are not (hyper)spherical in shape.

Spatial indexing can be used to speed up the search for the closest centroid, by pruning out the obviously distant ones (which is not crucial for small K-s, but almost necessary for larger ones - in my implementation it speeds up the clustering by a factor 5-10, depending on the data).

Also, the K-means search can be a bit stochastic in order to avoid converging to local optima and guide the final configuration to a global optimum. We have recently proposed one such method, the Global Hubness-Proportional K-Means (GHPKM) that performs rather well on high-dimensional data. The details can be found in the journal paper here:

http://www.computer.org/csdl/trans/tk/preprint/06427743-abs.html

Arun Rajendran

Thank you Gabriella Casalino and Nenad Tomašev for your explanations.. This will do for now. I am in the process of understanding the predictive analysis in data mining that helps in making intelligent decisions for any business.

Thank you once again for the inputs.

PhD in Business Analytics and Data Science ?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Which Scopus Journal provides the most affordable fees?

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

What are examples of AI for good projects a teacher can assign to students?