Suitable Machine Learning algorithm?

I do believe that your main concern is whether to chose a Generative model algorithm or a Discriminative model algorithm. In your case, since the two datasets are not in joint distribution, discriminative models will have a superior performance. The discriminative model provides a model only for the target variable conditional on the observed variables.

Generative models are far more flexible, yes. However, for your analysis, it won't stand optimal.

An example of the Discriminative model algorithm is the Support Vector Maching (SVM). I am sure you can find a lot of papers that have used SVM to study protein protein interactions.

Best.

Morteza Heidari

You can find a lot of machine learning systems for data classification. But, it is worthy of consideration that we can categorize all of them into two main groups of local and global classification systems. Based on each dataset, features vector and size of dataset a local or global classifier can make a better model for final classification. Hence, it would be logical if you train and test one local and one global classifier on your data to find out which one makes better results. KNN and SVM are kind of state of the arts in this area.

KNN is an instance-based “lazy” machine learning method to build an optimal classification function locally. and SVM is an “eager” machine learning method, which is trained using the entire training samples to build a global model of fitting the training data.

Therefore, I recommend you to make model for your data by both of them to find out which one is better.

Jumoke Soyemi

Thanks Anirban and Morteza for your contributions.

I will try your recommendations.

Munir Ahmad

Note: If you are using Support Vector Machine, picking up its kernel is very important so i recomend checking out your model with different kernels. Radial, Polykernel, RBF and others.

Thanks

Munir

Osman Ali Sadek Ibrahim

Hi Jumoke,

you can use scikit-learn as a user (not developer) a new method

http://scikit-learn.org/dev/_downloads/scikit-learn-docs.pdf

Thanks & Best wishes

Osman

Stéphane Breton

Dear Jumoke,

Linear and nonlinear classifiers based on the Kernel trick are appropriate tools to solve your problem. I suggest that you follow these references as a good starting point:

Lanckriet et al., "A statistical framework for genomic data fusion", 2004 - https://pdfs.semanticscholar.org/32e2/3cb933ed9b23d7da6da79645b8cd173ef68e.pdf
Ben-Hur et al., "Support Vector Machines and Kernels for Computational Biology", 2012 - http://www.mgene.org/lectures/MLSSKernelTutorial2012/text.pdf
Khondoker et al., "A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies", 2013 - http://journals.sagepub.com/doi/pdf/10.1177/0962280213502437
Neelima et al., "A comparative Study of Machine Learning Classifiers over Gene expressions towards Cardio Vascular Diseases Prediction ", 2017 - https://www.ripublication.com/ijcir17/ijcirv13n3_07.pdf

Deep learning has also a great potential in solving predictive problems. Follow:

Xie et al., "A Predictive Model of Gene Expression Using a Deep Learning Framework", 2016 - http://calla.rnet.missouri.edu/cheng/dn_gene_expression.pdf

Valuable references on computationally-efficient Deep Learning based on Extreme Learning Machines (ELM) paradigm:

Parkavi et al., "Recent Trends in ELM and MLELM: A review", 2017 - https://www.astesj.com/publications/ASTESJ_020108.pdf
Cambria et al., "Extreme Learning Machines", 2013 - https://pdfs.semanticscholar.org/19c3/7e976fd302e9cfc2de8a3adddb4ce48c4335.pdf
Huang et al., "Trends in Extreme Learning Machines: A Review", 2014 - https://www.researchgate.net/publication/267339744_Trends_in_Extreme_Learning_Machines_A_Review

Many reviews exist by searching for "microarray gene expression data" keywords. Follow as samples:

Pirooznia et al., "A comparative study of different machine learning methods on microarray gene expression data", 2008 - https://www.researchgate.net/profile/Mehdi_Pirooznia/publication/5485070_A_comparative_study_of_different_machine_learning_methods_on_microarray_gene_expression_data/links/09e41509c0eb53d07e000000.pdf
Ding et al., "Minimum redundancy feature selection from microarray gene expression data", 2005 - https://pdfs.semanticscholar.org/57e3/ddb142afd7dde5b4f39ee47e8e057d996bdb.pdf

Note: if needed, Thompson et al. developed a cross-platform normalization of microarray and RNA-seq data for machine learning applications. Follow:

Thompson et al., "Cross-platform normalization of microarray and RNA-seq data for machine learning applications", 2016 - https://peerj.com/articles/1621.pdf

Best Regards

Article A comparative study of different machine learning methods on...

Article Trends in Extreme Learning Machines: A Review

Jumoke Soyemi

Thanks Stéphane for your answers

Patricia Ryser-Welch

Have you considered a form of generative meta-learning/hyper-heuristics. These techniques generates some general algorithms using some specific operators.

Jumoke Soyemi

Jin and Patricia, I appreciate your contributions

I noticed that nobody up till now suggested Bayesian Network approach for this task.

Is it that the approach cannot work if I try it.

Stéphane Breton

Of course, Bayesian approach may be used to solve your problem. You may have a look to (and its related bibliography) "Controlling for Confounding Effects in Single Cell RNA Sequencing Studies using Both Control and Target Genes" by Chen et al., 2016:

http://www.biorxiv.org/content/biorxiv/early/2016/09/14/045070.full.pdf

Junaid Ali Reshi

The Machine learning approach that you will use will also depend on the quality of data. By quality of data, i mean the skew in the data (if any), the instances, the training dataset volume among other factors.

The approach that you have to use will also depend on the fact that whether you want decision trees to be employed, whether you want a set of rules or whether the discrimination should be an instant one(black box approach).

Among others, Random forest happens to be a good contender, followed by J48 decision tree.SVM is a bit tricky due to kernel problems and the tweaks it has.

Have you tried WeKa for implementation? It may come handy with its best classifier detection strategy.

Hope this may help.

Jumoke Soyemi

Ok Stéphane. Thank you

Jumoke Soyemi

Alright Junaid, let me take a look at WEKA and see how it works. I have never tried WEKA before now. Thanks

Stéphane Breton

WEKA is fully documented in "Data Mining-Practical Machine Learning Tools and Techniques" by I.H. Witten and E. Frank, 2nd Edition, 2005 available from:

http://cs.du.edu/~mitchell/mario_books/Data_Mining:_Practical_Machine_Learning_Tools_and_Techniques_-_2e_-_Witten_&_Frank.pdf

Additional reference:

https://www.researchgate.net/profile/Mark_Hall6/publication/221900777_The_WEKA_data_mining_software_An_update/links/09e41507f01ad2a029000000.pdf

Article The WEKA data mining software: An update

What should someone do about Self Plagiarism?

Is there any special way of reporting research done to get better acceptability in high impact journals ?

How can we build research skill into the students at an early stage like from even primary school?

Why does it take free publication journals so long to publish articles?, could it be because they are free?

Gene expression values to probability values, how ?

Tool to perform functional enrichment analysis for Human gene expression data

Metrics for determing two inter-species gene expression data interact using machine learning algorithm

Feedback defines the constitution of an organism?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Measuring the Intelligence of a Species?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Need help with my research project on open source SIEM and machine learning?

Swimming/space travel depends on the proprioceptive muscle spindles?

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Some new emerging problems on application of RL for scheduling in IoT networks?