A dataset with binary data for a two-class classification problem. How to decide if it is linear or non-linear? How to choose a good classifier?

Dear Ahmad Alhindi

For question 1,2 check the previous questions and answers.

https://www.researchgate.net/post/How_can_one_decide_on_using_a_linear_or_non_linear_classifier_for_the_dataset

https://www.researchgate.net/post/What_are_the_criteria_to_decide_the_dataset_is_linear_labled_non_linear_unlabled

https://www.quora.com/Given-a-dataset-how-can-I-tell-if-it-is-linear-or-non-linear

For question 3, you should try some of algorithms such as Linear Regression, Perceptrons, Naive Bayes Classifier, Decision Trees, ...

For question 4, check this post

https://www.quora.com/What-classification-model-would-you-recommend-for-this-small-dataset

For the last question,

https://www.researchgate.net/post/How_to_choose_appropriate_classifier

https://www.researchgate.net/post/How_to_decide_the_best_classifier_based_on_the_data-set_provided

Fahad A. Munir

If you are open to choose any classifier then what you mentioned here can probably be classified using regression. I think for such simple datasets there is no need to use much more complex algorithms. Regarding your question of how to find if the data is linear or not. I think you should design multiple regression classifiers, for instance, linear, second order and may be third order. Then train the classifiers. For each regression classifier, use test data sets to measure the MSE. Accumulate the total error for each classifier (various techniques are available for how to compute the total error for a classifier). This will give you an idea about how linear or nonlinear your data is. For example, if linear regression predicts best then the data is linear or near to linear, etc. If the data is multi-dimensional, it is not possible to visualize it on plots. I think visualizing different features separately is also not a good way. While doing all this please take care of the messy bias-variance dilema. If your data set is too small then you can use cross validation to avoid overfitting. I hope this helps your query.

Murat Cihan Sorkun

Hi Ahmad,

You can apply PCA on your features, then you can plot your points according to most effective components (found by PCA). It can help to see is your values are linearly separable or not.

Besides, I recommend using Neural Nets which is capable to represent both linear and nonlinear relations between your input and target.

Good luck!

Shashi Prakash Tripathi

For Non linearity and Linearity data visualisation is suggested. For model selection you can't go for DL bcz dataset is small it's better to use Logistic Regression in this case.

Srikrishna Muppalaneni

Hello Ahmad Alhindi

As your having all variables are binary type and target is also binary you can not perform statistical tests. My recommendation is don't use PCA or Kernel PCA it reduces your data accuracy. you can check for target variable classes are balanced or not. Few days i did a project on imbalanced data and with 70 binary variables, you can check in this link https://www.kaggle.com/krishna8ds/handling-imblanced-data-using-smote

Regards,

srikrishna

Abir Sen

Hi Ahmad ,

1.As your data set is in binary format then u can use neural network model(multiple layer perceptron)(MLP) for 2 class classification

2. U can also use Support vector machine (SVM) for classify this problem for linear-dataset but in case of non-linearity you can use ''KERNEL TRICK" for projecting the non-linear dataset into higher dimension space for making them linearly classifiable.

check here:

https://towardsdatascience.com/kernel-function-6f1d2be6091

Tim vor der Brück

I would suggest trying out Gradient Boosting (XGBoost) and Random Forest.

Eugene Veniaminovich Lutsenko

As a method of the research we used Automated system-cognitive analysis (ASC-analysis), which is a new innovative method of artificial intelligence: it also has its own software tool – an intelligent system called "Eidos" (open source software) [1, 2, 3].

The Eidos-X++ system differs from other artificial intelligence systems in the following parameters:

- was developed in a universal setting, independent of the subject area. Therefore, it is universal and can be applied in many subject areas (http://lc.kubagro.ru/aidos/index.htm);

- is in full open free access (http://lc.kubagro.ru/aidos/_Aidos-X.htm), and with the relevant source texts (http://lc.kubagro.ru/__AIDOS-X.txt);

- is one of the first domestic systems of artificial intelligence of the personal level, i.e. it does not take special training in the field of technologies of artificial intelligence from the user (there is an act of introduction of system "Eidos" of 1987) (http://lc.kubagro.ru/aidos/aidos02/PR-4.htm);

- provides stable identification in a comparable form of strengh and direction of cause-effect relationships in incomplete noisy interdependent (nonlinear) data of very large dimension of numerical and non-numerical nature, measured in different types of scales (nominal, ordinal and numerical) and in different units of measurement (i.e. does not impose strict requirements to the data that can not be performed, and processes the data that is) [12];

- contains a large number of local (supplied with the installation) and cloud educational and scientific applications (currently 31 and 152, respectively) (http://lc.kubagro.ru/aidos/Presentation_Aidos-online.pdf);

- provides multilingual interface support in 44 languages. Language databases are included in the installation and can be replenished automatically;

- supports on-line environment of knowledge accumulation and is widely used all over the world (http://aidos.byethost5.com/map5.php);

- the most time-consuming computationally, the operations of the synthesis models and implements recognition by using graphic processing unit (GPU) that some tasks can only support up to the solution of these tasks is several thousand times that really provides intelligent processing of big data, big information and big knowledge;

- provides transformation of the initial empirical data into information, and its knowledge and solution using this knowledge of classification problems, decision support and research of the subject area by studying its system-cognitive model, generating a very large number of tabular and graphical output forms (development of cognitive graphics), many of which have no analogues in other systems (examples of forms can be found in: http://lc.kubagro.ru/aidos/aidos18_LLS/aidos18_LLS.pdf);

- well imitates the human style of thinking: gives the results of the analysis, understandable to experts on the basis of their experience, intuition and professional competence.

Machine learning & Evolutionary multiobjective Optimization?

Machine Learning: what is a machine learning problem?

Looking for good journals in the area of Evolutionary Multiobjective Optimisation?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?