Anyone knows how to add some noise data to my classification datasets? I am confused about add the noise data into training sets or testing sets or the whole datasets...what's more,I do not clear how can I make it
Oluwarotimi seems to have the entire learning process covered. I'll just expand a bit more on the adding the noise part.
The usual type of noise that is added to a classification dataset is Gaussian noise. Provided your dataset feature/attributes comprises of real numbers, it is actually a simple process:
Fix a scale factor w
Find the standard deviation s_f of each feature f
for each instance,
for each feature value of feature f,
choose a random number x taken from the interval (-s_f, s_f)
add to that instance x / w
Note that the scale factor w determines the degree of noise that could be added to your data. Have it too low, your dataset would become too noisy and your machine learning algorithm would not converge. If w is too high, then the noise itself would be negligible.
When a fewer training data is available, one can add a small amount of noise to create a larger data set. Each time a training sample is exposed to the model, random noise is added to the input variables making them different every time it is exposed to the model. In this way, adding noise to input samples is a simple form of data augmentation. The best way is to normalize the values and somehow add noise based on Gaussian distribution.