Clearly. the answer depends on which class of unsupervised algorithms you are referring to.
For example, dimensionality reduction techniques are generally evaluated by computing the reconstruction error. You can do this using similar techniques with respect to supervised algorithms, e.g. by using an holdout test set, or by applying a k-fold cross validation procedure.
Clustering algorithms are more difficult to evaluate. Internal metrics [1] use only information on the computed clusters to evaluate if clusters are compact and well-separated (this is what is also mentioned on the answer of A.G. Ramakrishnan). Also, you can have external metrics that perform a statistical testing on the structure of your data [1].
Density estimation is also rather difficult to evaluate, but there are a wide range of techniques which are mostly used for model tuning [2], e.g. cross-validation procedures.
In addition, unsupervised strategies are sometimes used in the context of a more complex workflow, in which an extrinsic performance function can be defined. For example, if clustering is used to create meaningful classes (e.g. clustering documents), it is possible to create an external dataset by hand-labelling and test the accuracy (the so-called gold standard). Similarly, if dimensionality reduction is used as a pre-processing step in a supervised learning procedure, the accuracy of the latter can be used as a proxy performance measure for the dimensionality reduction technique.
[1] Halkidi, Maria, Yannis Batistakis, and Michalis Vazirgiannis. "On clustering validation techniques." Journal of Intelligent Information Systems 17.2-3 (2001): 107-145.
[2] Hall, Peter, Jeff Racine, and Qi Li. "Cross-validation and the estimation of conditional probability densities." Journal of the American Statistical Association 99.468 (2004).
Basically, you are clustering the data in the feature space. So, you can look at the intra-cluster variance and the inter-cluster variances. For example, you can separate the characters with different colours from a document image (even a camera captured one) using unsupervised methods. You know how to evaluate the results.
Clearly. the answer depends on which class of unsupervised algorithms you are referring to.
For example, dimensionality reduction techniques are generally evaluated by computing the reconstruction error. You can do this using similar techniques with respect to supervised algorithms, e.g. by using an holdout test set, or by applying a k-fold cross validation procedure.
Clustering algorithms are more difficult to evaluate. Internal metrics [1] use only information on the computed clusters to evaluate if clusters are compact and well-separated (this is what is also mentioned on the answer of A.G. Ramakrishnan). Also, you can have external metrics that perform a statistical testing on the structure of your data [1].
Density estimation is also rather difficult to evaluate, but there are a wide range of techniques which are mostly used for model tuning [2], e.g. cross-validation procedures.
In addition, unsupervised strategies are sometimes used in the context of a more complex workflow, in which an extrinsic performance function can be defined. For example, if clustering is used to create meaningful classes (e.g. clustering documents), it is possible to create an external dataset by hand-labelling and test the accuracy (the so-called gold standard). Similarly, if dimensionality reduction is used as a pre-processing step in a supervised learning procedure, the accuracy of the latter can be used as a proxy performance measure for the dimensionality reduction technique.
[1] Halkidi, Maria, Yannis Batistakis, and Michalis Vazirgiannis. "On clustering validation techniques." Journal of Intelligent Information Systems 17.2-3 (2001): 107-145.
[2] Hall, Peter, Jeff Racine, and Qi Li. "Cross-validation and the estimation of conditional probability densities." Journal of the American Statistical Association 99.468 (2004).
First, you must know that is the data set consist of supervised data or not? In many researches supervised learning employed on unsupervised data and vice versa.
Here is a thesis that suggests that the cross-validation is a valuable tool for unsupervised learning...
I found it here http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http://arxiv.org/pdf/0909.3052.pdf
You can simply evaluate its accuracy on classification problems.
Let the number of distinct classes be k in the classification problem. Apply your clustering algorithm to find k clusters and find the label for each cluster (based on the frequency of classes at each cluster). Then you can use this label as predicted class and evaluate performance.
I have to validate formally an association rule algorithm. The association rule produces rules "A, B, C -> D, E" and so on. Are there any tool or instrument that I can use? Or which is the best approach to do that?
Yes I wanted the answer to that too. I'm trying to build a validation model for a RETAIL DATASET (it only has the items bought by multiple customers in the individual transactions and the entire dataset has about 90,000 transactions) but I am very confused where to start from.
All I know is this:
1)Divide the dataset into training, validation and test.
2)Apply the model on training, and later test it on test dataset. WHAT IS VALIDATION DATASET USED FOR?
3)HOW DO I CREATE A MODEL? I was told I have to create Apriori and Eclat testing models BUT HOW?
This is really an important question in using latent class(profile) analysis to separate a population. The following paper utilized an interesting technique for the validation.
Please take a look at the content provided by RapidMiner as educational material. Additionally, the latest book called Data Science from Vijay Kotu and Bala Deshpande provide excellent illustrations on how to solve this.