can anyone help me how to use KDD dataset in WEKA to implement difference data mining techniques. anyone have step by step guide i would be very grateful
I think that you can find useful information from following links. Besides, you can download these links by using http://keepvid.com website from China.
I have worked on a project a few months ago using this dataset (KDD Cup). I have used WEKA and Python. For WEKA, you have two choices: The first one is the simplest, it consists on downloading the dataset in WEKA format (.arff file) from http://tunedit.org/repo/KDD_Cup or downloading the original one from http://www.kdd.org/kdd-cup/view/kdd-cup-2009/Data and try to convert it into WEKA format.
I just want to tell you that the dataset is huge, when using WEKA, it can crash because it's too big that WEKA can't handle it. The solution is to create a subset (10%) and then try the data mining methods available on WEKA.
In Weka is very easy to test different classifiers' performance one against eachother by using the Experimenter interface. Load the dataset, filter it if necessary using filters, load the classifiers one by one or all, run the experiment, see the results. Or you have the possibility to save the output as csv or whatever and have it analyzed separately.
However, the Experimenter is not a good choice for very large datasets, and (randomly) splitting the data (recommendable on class feature) is a good choice if it's conveninent for your purposes.
If you want to process very large datasets in Weka use the command-line interface (CLI), pick only fast converging algorithms and avoid cross-validation for evaluation.
The best choice are the updateable algorithms (incremental classification models, learning by using only one one instance at a time) - Weka provides a good range of such.