Which is Suitable machine learning algorithm for weak data (small dataset)?

More Rajeswari Devarajan's questions See All

How to rectify the issue 'Missing or bad data: Atomic Orbital energies line number 872' in dft (gauss View 5.0.8)?

The optimized structure of my compound is generated in .log format but for the .chk format it shows the error as mentioned in question. With the .log format of optimized file I submitted the next...

10 May 2023 1,970 2 View

How do you render nanocellulose fiber films (nanopaper) transparent?

Despite owing to their nano sizes (diametric), cnf paper are still translucent or only have contact transparency. How do you make these films more transparent?

02 February 2023 8,787 6 View

Why is my CNF wet film sticking to filter membrane and not forming a sheet after filtration?

I am filtering a CNF slurry of 0.1 wt% concentration through a mixed cellulose ester membrane under vacuum. The CNF is extracted from hemp fiber through a series of alkaline and acid treatments...

13 November 2022 2,234 0 View

How to connect Arduino portenta H7 to transparent OLED screen?

Hey peers! A little guidance can be really helpful here How to connect Arduino Portenta H7 to DFROBOT Fermion 1.51 "OLED transparent display with available breakout board (converter)

02 November 2022 4,950 0 View

Can deep learning be used in signal processing to filter?

I understand that we use butterworth and chebyshev filters to filter out noise etc in signal processing. Can that be substituted by Deep learning?

29 October 2022 7,028 5 View

How much should be my concentration of CNF (Cellulose nanofiber) to make a clear transparent paper by filtration?

I am using hemp fibers to make a CNF solution; how much should be the concentration to vacuum filter and get a transparent nanopaper?

24 September 2022 5,708 5 View

What are the application of medicated syrups in drug delivery?

I need this for my research purpose and also the reason for choosing syrup if possible. is it possible to prepare from natural sugars. if possible please suggest some related articles on it along...

13 September 2021 6,146 1 View

Crosslinking nuclear proteins within 91U mouse cells with DSP before IP and western blot?

I have put together a protocol below to crosslink the proteins that is potentially bound with my nuclear protein of interest. Please reply with suggestions and your experiences. 1. Dissolve DSP...

11 April 2021 864 0 View

Which model in mice is better to observe and study anti-cancer activity of a herbal formulation?

I prepared a herbal preparation for anticancer activity. Now i wanted to prove it by doing animal studies, can any one suggest the best model to do this activity?

07 March 2021 9,772 3 View

How to convert the pixels into inches in matlab?

i have found the area of segmented image in pixels. How can i convert into inches

01 January 2021 3,865 7 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

Vincent F Adegoke Popular answer

Hi Abdullah,

It depend on several factors such as attributes of your datasets, your end goal, etc. This is because data can be in many forms/types: numerical, categorical, time series or text. Different models can handle different data types and give different end results. For instance, Naive Bayes is a simple yet powerful algorithm for predictive modelling, however may not be able to handle all aspects of data types. AdaBoost is good for handling numerical data types. KNN, SVM, etc. can handle classification and regression tasks. Logistic regression is like linear regression, etc. In short:

1. Study your data properties

2. Clarify you objective i.e. what you intend to achieve with the data (time series, classification/prediction, etc.

3. Study available models that can handle your data properly.

4. Then work on how to improve the performance of the algorithm

You can try WEKA an open source machine software. When you loaded your data using WEKA, the tool will only load models that are compatible with your datasets.

For different learning algorithms check these sites –

https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/

https://www.bmc.com/blogs/machine-learning-algorithms/

Hope this helps?

Many Regards,

Vincent.

Raoul G. C. Schönhof

Hello Rajeswari Devarajan, I have also encountered the challenge of small datasets in my research into analyzing 3D-models of parts. Using an Autoencoder to compress the data could help.

The danger is overfitting the model. Have you tried data augmentation to increase the dataset?

Cheers,

Raoul

Btw: If the dataset is extremely small, you may want to use SVMs, Decisiontrees or especially bayesian Networks.

Eugene Veniaminovich Lutsenko

Automated system-cognitive analysis

Akpofure A Enughwure

Hi Rajeswari Devarajan

For small datasets, one thing one must avoid is 'overfitting the data' hence simple machine learning like 'Logistics Regression, Linear Regression and Bayesian Linear Regression will do fine...

Hope this is helpful

Vincent F Adegoke

As others suggested you can use any of the machine learning algorithms that supports your data set attributes. Not quite sure what you meant by weak dataset: is it class imbalance datasets? If my guessing is right you may need to use any of the algorithms with K-fold cross-validation concept.

You can also generate some synthetic data that are identical to your original data (with caution).

Having said that, ensemble algorithm such as AdaBoost with reweighing method should be able to solve the problem. AdaBoost supports several simple and complex algorithms as base classifier. With this you can find out which of the base classifiers is best for your samples.

Hope these helps?

Good luck,

Riccardo Cantini

If you have a very small dataset, I think you'd better use, if possible, a pre-trained model exploiting the following techniques, which are very usefull when data are not enough to build a full model from scratch, as in your case.

Transfer learning used to transfer the abilities of a pre-trained model to another.
Fine tuning used to incrementally adapt the pretrained features to your specific dataset.

If you can't proceed like this, you must train a new model from scratch and the risk is to overfit your training data. In order to avoid this you should avoid complex models and over-parametrization. In addition, you can:

Enrich your data by adding synthetic samples (you can use an oversampling technique such as SMOTE)
Use an ensemble learning model, i.e. combining the predictions of different poorly correlated weak learners, according to the most appropriate strategy (boosting, voting, stacking/meta-learning)
Use some regularization mechanisms to avoid overfitting (such as dropout, L1, L2 and so on)

Hope it helps!

Riccardo

Abdullah Al Mamun

Should use simple linear model like linear regression to avoid the over fitting.