How to add some noise data to my classification datasets?

Oluwarotimi Williams Samuel Popular answer

Dear Tao Lee,

I would suggest you do the following:

Assuming you have a total of 100 data sample named "Dataset":

The random noise can be added as follows:

1. compute the random noise and assign it to a variable "Noise"

2. Add the noise to the dataset ( Dataset = Dataset + Noise)

3. Partition the Noisy Dataset into three parts:

a). 70% for training (Training data)

b). 15% for validation (Validation data)

c). 15% for testing (Testing data)

Alternatively, you can partition the Noisy Dataset into two part:

a). 75% for training (Training data)

b). 25% for testing (Testing data)

4. Then, you can then use a classifier ( Neural Network, SVM, LDA, ...) to classifier the Dataset

5. Finally, You can now examine the classification accuracy of the classifier.

I hope this information will be helpful for you

Good luck.

Zeeshan Anwar

Add nose data into whole set and select 70% for training and 30% for validation to prevent over fitting.

Oluwarotimi Williams Samuel

Dear Tao Lee,

I would suggest you do the following:

Assuming you have a total of 100 data sample named "Dataset":

The random noise can be added as follows:

1. compute the random noise and assign it to a variable "Noise"

2. Add the noise to the dataset ( Dataset = Dataset + Noise)

3. Partition the Noisy Dataset into three parts:

a). 70% for training (Training data)

b). 15% for validation (Validation data)

c). 15% for testing (Testing data)

Alternatively, you can partition the Noisy Dataset into two part:

a). 75% for training (Training data)

b). 25% for testing (Testing data)

4. Then, you can then use a classifier ( Neural Network, SVM, LDA, ...) to classifier the Dataset

5. Finally, You can now examine the classification accuracy of the classifier.

I hope this information will be helpful for you

Good luck.

Tao Lee

Thanks all of you@Zeeshan Anwar@Oluwarotimi Williams Samuel。 I will appreciate it！

Ebenezer R.H.P. Isaac

Oluwarotimi seems to have the entire learning process covered. I'll just expand a bit more on the adding the noise part.

The usual type of noise that is added to a classification dataset is Gaussian noise. Provided your dataset feature/attributes comprises of real numbers, it is actually a simple process:

Fix a scale factor w

Find the standard deviation s_f of each feature f

for each instance,

for each feature value of feature f,

choose a random number x taken from the interval (-s_f, s_f)

add to that instance x / w

Note that the scale factor w determines the degree of noise that could be added to your data. Have it too low, your dataset would become too noisy and your machine learning algorithm would not converge. If w is too high, then the noise itself would be negligible.

Hope this helps

Tao Lee

Thanks all of your help@Ebenezer R.H.P. Isaac@Mahboobeh Parsapoor

Bharat Sehgal

Dear Tao Lee,

It would be more realistic to train your model on noise free dataset and then testing it on a noisy dataset.

Good luck

Uduak Idio

Ebenezer R.H.P. Isaac has said it all. The best idea for noise is the Gaussian noise. I have been using it all the time.

Nyakno jimmy George

When a fewer training data is available, one can add a small amount of noise to create a larger data set. Each time a training sample is exposed to the model, random noise is added to the input variables making them different every time it is exposed to the model. In this way, adding noise to input samples is a simple form of data augmentation. The best way is to normalize the values and somehow add noise based on Gaussian distribution.

Anyone has some researchs about face recognition using nearest feature line embedding(NFLE)?

I want to know who is studying the fuzzy incremental algorithm and may give me some help?

Feedback defines the constitution of an organism?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How to design human-centered classroom in the age of A.I.?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?

What's the role of IT & AI in Telecommunication Industry?

Can usage of AI tools like chat GPT in research work is recommendable ?