I need benchmark datasets for Android malware /benign API call sequences. The API call sequence is the ordered sequence of APIs that appears during dynamic analysis for android applications
Dear sir, there are various online data repositories are available that share the malware samples for free and on paid subscriptions. Some free and famous repository links are following-
The best resolution is to extract the dynamic APIs by yourself. You can execute some programs in a virtual machine environment, monitor the execution and record the APIs. You can obtain clearer dynamic analysis reports and understand the semantics of the programs more accurately. In addition, you can make a horizontal comparison with some online datasets.
While I am not sure if api call sequences are readily available, a large collection of malware dataset is available at https://www.unb.ca/cic/datasets/index.html.
It is better to read an article which explains feature extraction first. There are many datasets available (check this out: "Two Anatomists Are Better than One—Dual-Level Android Malware Detection" and "Feature Importance in Android Malware Detection")