Hello,

I am working on binary classification problem.

The data that I have is a set of independent variables (features) and has a target (class or label) as follows:

  • size of the data set between 50 ~ 300 data instance
  • each data instance tagged with two classes : 1=good and 0=bad
  • independent variables or features are binary data
  • size of features vary among 250, 500, and 750 features.
  • Example of the data-set in both CSV and EXCEL is attached
  • below is four examples (with 15 features) for illustration only
  • ---- f e a t u r e s ---- | class
  • 101101111000110 | 0
  • 111111110101010 | 1
  • 010000010110101 | 1
  • 010101010011011 | 0
  • 001001100100111 | 0
  • Now, as to the aforementioned problem and its data-set, my questions are:

  • How to decide that the data-set is linear or non-linear ?!
  • If visualizing the data is possible, Can any one help please?!
  • How to choose a good classifier for the data-set, considering that it has binary features (set of zeros and ones)
  • For small data-set that has
  • Similar questions and discussions