I have a question regarding WEKA API. I need to read the ARFF file and save specific selected attributes only to new ARFF file. Currently, I can only delete the unwanted attributes.
Thanks Samer. I am using WEKA for handling arff files and do my own processing on the data with Java.
I have a look at ArffSaver, it save the Instances object. I need a way to select specific attributes from the Instances object and save them with the class.
Before going through the code, in order to understand your case correctly, I attached here two files represent the loaded iris dataset. First one shows the whole 5 attributes (including the class attribute), while the second one shows only 3 attributes (with the class) after removing first two attributes from the data. Now, are you asking about the way of saving those 3 attributes using API?
NB: AGAIN, I know that you need to work from the API, but my aim is understanding your situation from the GUI in order to provide a proper help.
Sorry, in Weka under the preprocess tab, you can choose filters. Under "unsupervised-->attribute" part, you can see some options like remove, removeByName etc.
Tunc Guven Kaya, Yes you are right. I only found this way on WEKA API that allow removing un wanted attributes from the Instances Object. But, I am looking for another way to do.
Well, seems like you didn't read my previous post correctly, where I clearly mentioned that my major aim was *only* understanding your issue from the GUI, in order to help you with the API side (not asking you using the GUI).
Anyway, back to your question. Here are two solutions:
First solution: adjusting the ARFF relation, but it would require more code and would be more likely to yield bugs.
Second solution: using the "inverse" mode of the Remove filter. Something like this:
This won't be a direct answer to your question but as a Weka laborer for years I find it much easier to perform some certain tasks on other platforms like NotePad++, LibreOffice etc.
For example, if you open the data part of the arff file in libreoffice or excel, you can delete the columns you do not need and that gives you the solution for this specific problem.
Moreover, you can perform column operations to create new features or manipulate the existing ones. Let's assume that you have 3 numerical values as your features: f1, f2, f3. You can perform multiplication like f1*f2 on the next column. You can even abstract some higher-level features like NDVI value for satellite imagery by doing more complicated calculations.
By using NotePad++, you can delete some attributes line by line by using regular expressions. Or you can merge some labels with "find & replace" command.
By using different tools, you can also visualize your data or do normality tests, t-tests etc.
In a nutshell, data part of the arff is just csv format and you can copy that part into some other software where you feel more talented and competent and perform whatever you need.