It depend on several factors such as attributes of your datasets, your end goal, etc. This is because data can be in many forms/types: numerical, categorical, time series or text. Different models can handle different data types and give different end results. For instance, Naive Bayes is a simple yet powerful algorithm for predictive modelling, however may not be able to handle all aspects of data types. AdaBoost is good for handling numerical data types. KNN, SVM, etc. can handle classification and regression tasks. Logistic regression is like linear regression, etc. In short:
1. Study your data properties
2. Clarify you objective i.e. what you intend to achieve with the data (time series, classification/prediction, etc.
3. Study available models that can handle your data properly.
4. Then work on how to improve the performance of the algorithm
You can try WEKA an open source machine software. When you loaded your data using WEKA, the tool will only load models that are compatible with your datasets.
For different learning algorithms check these sites –
Hello Rajeswari Devarajan, I have also encountered the challenge of small datasets in my research into analyzing 3D-models of parts. Using an Autoencoder to compress the data could help.
The danger is overfitting the model. Have you tried data augmentation to increase the dataset?
For small datasets, one thing one must avoid is 'overfitting the data' hence simple machine learning like 'Logistics Regression, Linear Regression and Bayesian Linear Regression will do fine...
As others suggested you can use any of the machine learning algorithms that supports your data set attributes. Not quite sure what you meant by weak dataset: is it class imbalance datasets? If my guessing is right you may need to use any of the algorithms with K-fold cross-validation concept.
You can also generate some synthetic data that are identical to your original data (with caution).
Having said that, ensemble algorithm such as AdaBoost with reweighing method should be able to solve the problem. AdaBoost supports several simple and complex algorithms as base classifier. With this you can find out which of the base classifiers is best for your samples.
If you have a very small dataset, I think you'd better use, if possible, a pre-trained model exploiting the following techniques, which are very usefull when data are not enough to build a full model from scratch, as in your case.
Transfer learning used to transfer the abilities of a pre-trained model to another.
Fine tuning used to incrementally adapt the pretrained features to your specific dataset.
If you can't proceed like this, you must train a new model from scratch and the risk is to overfit your training data. In order to avoid this you should avoid complex models and over-parametrization. In addition, you can:
Enrich your data by adding synthetic samples (you can use an oversampling technique such as SMOTE)
Use an ensemble learning model, i.e. combining the predictions of different poorly correlated weak learners, according to the most appropriate strategy (boosting, voting, stacking/meta-learning)
Use some regularization mechanisms to avoid overfitting (such as dropout, L1, L2 and so on)