The real question here, rather than the size of the data, to me is "how representative is the data to the real problem domain". If your dataset is "complete" in representing the system, data size is immaterial. Based on my experience, I recommend Bayesian algorithms or SVM as the two best tools to explore. The real test for whether the system is memorising or generalising is to try it on a data that is never seen by the system and test the quality of the prediction.
could you give us an idea how the data looks like?
How many positive, and how many negative samples do you have?
What is the size, and the structure of each data sample?
Type of values: binary, 1-out-of N, real valued, bounded or not?, what range,
or a mixture of the above, images, time series?
Different data structure will require different paradigms.
And any other restrictions that are available,
e.g. is avoiding False-Positives more relevant than avoiding False-Negatives?
In general, dealing with small datasets is allways a challenge for machine learning approaches, because statistics and regression demands larger data sets to estimate or model the underlying probability distribution.
Mr. Bilal and Mr. Goerke, I really appreciate your quick reply. Actually I am working on a dataset of 45 analog molecules. Number of continuous descriptors were calculated for this dataset whereas Molecules are categorized as 14 actives and 31 inactives. On the basis of this small dataset, I want to generate some sort of supervised classification model or decision trees.
Simplest models would be best, because you probably don't have enough data to estimate many parameters in more complex classifiers reliably. I'd suggest a linear classifier (linear discriminant / fisher / nearest mean / logistic) or nearest neighbor to get an idea of how your data behaves.
If you are interested in molecule activity, you might want to take a look at Multiple Instance Learning. The idea is to represent a molecule not by a single descriptor, but by a set of descriptors, one of each of the ways the molecules can fold.
You may want to look into using Rough Set Theory (RST). Unlike classical set theory, where the boundaries between what is and isn't in a set is crisp, RST provides a region between what is definitely in the set and what is definitely not in the set (based on your data) that is uncertain. As a result, you could have a ternary classifer (yes, no, and I don't know), which would probably be more meaningful, given your small dataset.
Take a look at the attached picture for what a rough set looks like. Each box in the grid represents an equivalence class. The dashed line represents the boundary of the actual, but unknown set. The lightly shade boxes within the boundary represent was is known to be definitely in the set. The heavily shaded boxes represent what is uncertain - you don't know whether or not it is in the set based on this granularity.
As Zhaoqiang Xia said SVMs seem to work in such cases. I have used them successfully with small datasets as long as you are able to identify a discriminating parameter.
The accuracy depends very much on the data characteristics. If the classes are well separated, results will be even with simple classifier. How about looking at the learning curve to ascertain this?
In general, there is no clear winner always. In fact, the no-free-lunch theorem shows just that. Given any classifier, a dataset can be always constructed that beats it, i.e., the classifier will perform very poorly. Thus, it is the "data" that is important, not the classifier per se.
With a so small dataset, a good election, as it can interpolate gracefully, is Self Organizing Map (a good toolbox for Matlab in http://www.cis.hut.fi/projects/somtoolbox/). SOM is non-linear, unsupervised, behaves softly with linear initialization, and it can show you a bidimensional projection of the multidimensional space of the data. So you can identify how properties of the data set are clustered and interelated.
One other technique you might want to consider applying to your small dataset is "bootstrapping". This technique creates pseudo-replicate datasets from a single dataset through resampling. I used it successfully in a past project on a dataset of 40 samples. To read about my experience, check out my paper "Neural Network Approach For Estimating Mass Moments of Inertia and Center of Gravity in Military Vehicles" in my RG profile.
If you want to try svm and the usual rbf-kernel does not work well, try a kernel with a lower "capacity". rbf kernels often lead to high-quality results in many applications because they can realize any decision boundary when enough training data is provided. They are often too complex for very small problems, especially if the training data contains noise of some sort. Less compleex kernels such as linear or lower degree polynomial kernels might be a solution in those cases. Anyway, svm hyperparameters should be learned with care, using cross validation or some other bootstrap technique.
For a binary classification problem , support vector machines are definitely the better classifier , remember to check the confusion matrix, to have a better understanding of the accuracy , since if one of the dataset is large enough, then you may get high accuracy by classifying all instances as one class itself.
If classifier perform poor (it is called weak classifier, and usually performance is around 50% or little above), then Boosting methods may be a good choice classification purpose.
Consider also analyzing your features first. There may be some redundant information available in the feature set, so you may want to focus on emphasizing powerful features and eliminate weak features.. For this purpose, mRMR or similar feature selection and analysis tools can be used before conducting classification.
SVM is a strong alternative. It is a very strong classifier, but parameter tuning is difficult when the data is huge.
No general best method. It depends a lot on the dataset itself. In my opinion, the best pipeline is (1) to try different features with one or more machine learning or other statistical algorithm(s), and perform permutation tests to select those significant features (avoiding overfitting the neutral features); (2) to try different algorithms, and compare the performance with bootstrapping or cross-validation strategy (AUC or other cutoff-independent parameters are suggested).
As most of the answers point out all depends on the quality of the data. If it is a practical problem where the result can be judged by a human (and connects to a solution) and there is a small set of data (few cases) with many attribute, I mostly use Case-Based Reasoning (CBR) as methodology and apply different similarity metrics and combinations of them (many mentioned in previous answers) until user/expert agrees with results - this works surprisingly well in many real applications. So if conditions are right and nothing else works well, have a look at CBR.
Well the main problem is the size of the dataset. 45 samples is really small, and the second question we should ask is what is the dimensionality of your problem.
But in general you have to try several methods keeping in mind some rules how to choose the best solution.
1)) In your case the most important is the validation process. If you just consider the confusion matrix without good validation method you may overfit the data. In the case of small sample size the best is bootstraping, because it allows resampling many times, thou correctly estimating the accuracy. I don't recommend Leave One Out because it results are usually overestimated.
3) Start tuning your classifier starting from the simplest and then try more complex (flexible) ones - as Michael Kemmler suggested try Linear SVM, then change kernel to polynomial and try to increase its order.
4) You may also consider other popular methods like Naive Bayes, or Decision Trees which usually work good on that kind of data.
5) For small datasets also one nearest neighbor works well.
6) For complex boolean problems creating new features works very well such as performing linear projection using different methods but you can try the simplest like LDA and then applying other classification methods. For example in case of XOR problem projecting data on the diagonal solves the problem for almost any classifier (except linear methods)
But as I started, remember to carefully estimate the accuracy of your model.
The real question here, rather than the size of the data, to me is "how representative is the data to the real problem domain". If your dataset is "complete" in representing the system, data size is immaterial. Based on my experience, I recommend Bayesian algorithms or SVM as the two best tools to explore. The real test for whether the system is memorising or generalising is to try it on a data that is never seen by the system and test the quality of the prediction.
1. Type of data.- How many categorical variables and how many quantitative variables in the data set.
2. Are you trying to mimic the decisions taken by human beings (loan granting decisions) or is it a data related to example flowers (IRIS) data set. The later is a difficult classification problem.
...in my case i have a 16 set of experimental data in which there are 4 input variables and 3 output variables....i want larger set of data for further modelling....can i use bootstrapping method to get larger data out of 16sets of experiments?