I want to work on permission based malware detection on android.My questions are: 1. how to extract permission features from androidmanifest.xml ? 2.how to create arff file by python using those extracted features?
Personally, I use Androguard to extract Permissions from Android Apps.
For 2., I don't know of any tool other than Weka to produce (sparse) ARFF files. Though, the Arff format is rather simple, so writing a custom CSV to Sparse ARFF converter in python is just a matter of a few dozens lines of code.
A small note: To me, the only use of ARFF is that, once loaded in Weka, it takes less memory space than the same data in a CSV format. (The CSV loader in Weka is /slightly/ inefficient). If you plan to work only on permissions, whose number is fairly small, you'll have small-enough datafiles that they'll fit into Weka's memory whatever the format.
Also, you'll have to define more precisely what you want as features. Just the request of a permission (i.e., a boolean) ? All permissions ("Official" ? or also specific?), etc
The structure for declaring permission in the AndroidManifest.xml file is shown here:
In this way, there are several strings that are used for declaring the permission us-age of the different Android applications, such as android.permission.CAMERA or android.permission.SENDSMS. I want to extract such permission. Should it be extracted as boolean or in original form? I finally want to convert those permission into ARFF file so that it can be used in classification by Weka.
Please suggest me that how i will extract permission through Androguard and how those permission can be provided to my python script as input? Then hopefully using python i will be able to convert them into ARFF.
i want to classify any apk based on permission(request of a permission) features using weka.So arff file has to be created..Can you suggest me the steps?
i understand at first i need to extracted permission from individual manifest.xml using Androguard.How can i combines permissions of multiple apk to a single arff file?