How to add some noise data to my classification datasets?

More Tao Lee's questions See All

Could you recommend some articles on Urban Transportation System optimization and Innovation?

13 August 2024 2,595 3 View

Does anyone know what might be causing the smeared bands on my western blot?

I got these smeared bands quite often lately. We typically run the gel at 140V with a 10-12% gel and do a wet transfer at 220 mA for 1.5 hr in cold room. We also noticed some dirty spots/dots (see...

10 August 2024 7,480 3 View

Is there any cases algae not using the nutrient from the wastewater and grow normally?

I am working on microalgae cultivation using waste water. The initial concentration of nutrients were less but the microalgae has achieved biomass growth of 2 g/L. The final concentration of...

08 August 2024 4,812 2 View

Fabrication of narrow glass capillary (inner diameter 200~500 nm)?

I need a very narrow glass capillary for my research. How can I make a glass capillary with an inner diameter of around 200 to 500 nanometers?

08 August 2024 8,915 0 View

1. If I can quantize the atom using this hyperbolic spiral and classical physics, could nature do the same?

If we map as a continuous motion an ionising electron (beginning its journey at n=1) in an H atom, a specific hyperbolic spiral appears (see animation). When we solve this spiral formula, we find...

07 August 2024 5,343 2 View

Articles on" Gender disparities i leatherwork education"?

Articles on" Gender disparities i leatherwork education"

07 August 2024 2,500 0 View

Why results of ROS flurescence are negative as there was no bacteria within?

Hello. I am working on ROS production of two systems: system A is cerium oxide and hydrogen peroxide, system B is cerium oxide nanoparticle, hydrogen peroxide and potassium bromide. I did some...

04 August 2024 5,974 3 View

What should I do with parameters that are not relate to my simulation in MyLake model?

I want to Estimate surface heat fluxes using MyLake, but I don't have all the initial values in model parameters section and other sections,is there a way?

04 August 2024 1,537 1 View

Why reactivity isn't increased with more empty spots in valence shell?

If from a geometric perspective the non-halogens, non-noble gases have more empty spots in their valence shell, and the filling/exiting of any of the empty spots in the shell constitutes a...

03 August 2024 4,787 2 View

Why is the molecule's orientation with an electric field affect polarizability?

Why is the molecule's orientation with an electric field affect polarizability? Electrons are diffuse enough to be independent with respect to orientation and effect of electric field on...

03 August 2024 7,843 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

What are examples of AI for good projects a teacher can assign to students?

So I am organizing an AI seminar. What are possible AI projects in the AI for good spirit? something the students can do and have an impact?

08 August 2024 9,437 4 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How to design human-centered classroom in the age of A.I.?

08 August 2024 347 5 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

What's the role of IT & AI in Telecommunication Industry?

05 August 2024 8,264 3 View

Can usage of AI tools like chat GPT in research work is recommendable ?

AI tools like ChatGPT can enhance research work significantly when used responsibly and in conjunction with thorough human oversight.

05 August 2024 1,842 3 View

Oluwarotimi Williams Samuel Popular answer

Dear Tao Lee,

I would suggest you do the following:

Assuming you have a total of 100 data sample named "Dataset":

The random noise can be added as follows:

1. compute the random noise and assign it to a variable "Noise"

2. Add the noise to the dataset ( Dataset = Dataset + Noise)

3. Partition the Noisy Dataset into three parts:

a). 70% for training (Training data)

b). 15% for validation (Validation data)

c). 15% for testing (Testing data)

Alternatively, you can partition the Noisy Dataset into two part:

a). 75% for training (Training data)

b). 25% for testing (Testing data)

4. Then, you can then use a classifier ( Neural Network, SVM, LDA, ...) to classifier the Dataset

5. Finally, You can now examine the classification accuracy of the classifier.

I hope this information will be helpful for you

Good luck.

Zeeshan Anwar

Add nose data into whole set and select 70% for training and 30% for validation to prevent over fitting.

Oluwarotimi Williams Samuel

Tao Lee

Thanks all of you@Zeeshan Anwar@Oluwarotimi Williams Samuel。 I will appreciate it！

Ebenezer R.H.P. Isaac

Oluwarotimi seems to have the entire learning process covered. I'll just expand a bit more on the adding the noise part.

The usual type of noise that is added to a classification dataset is Gaussian noise. Provided your dataset feature/attributes comprises of real numbers, it is actually a simple process:

Fix a scale factor w

Find the standard deviation s_f of each feature f

for each instance,

for each feature value of feature f,

choose a random number x taken from the interval (-s_f, s_f)

add to that instance x / w

Note that the scale factor w determines the degree of noise that could be added to your data. Have it too low, your dataset would become too noisy and your machine learning algorithm would not converge. If w is too high, then the noise itself would be negligible.

Hope this helps

Thanks all of your help@Ebenezer R.H.P. Isaac@Mahboobeh Parsapoor

Bharat Sehgal

It would be more realistic to train your model on noise free dataset and then testing it on a noisy dataset.

Good luck

Uduak Idio

Ebenezer R.H.P. Isaac has said it all. The best idea for noise is the Gaussian noise. I have been using it all the time.

Nyakno jimmy George

When a fewer training data is available, one can add a small amount of noise to create a larger data set. Each time a training sample is exposed to the model, random noise is added to the input variables making them different every time it is exposed to the model. In this way, adding noise to input samples is a simple form of data augmentation. The best way is to normalize the values and somehow add noise based on Gaussian distribution.

Shrey Sukhadia

Hi, Ebenezer R.H.P. Isaac

What if I want to avoid negative values in random number generation between (-stddev, +stddev), i.e. just generate random numbers between (0,+stddev). Would that still be a valid Gaussian noise? The reason am mentioning this is because the feature types that I already have the data for (i.e. to which I need to add noise), have positive values (with only couple features, i.e. may be 10 out of 700 having some low negative values). This is purely dependent on the range of these feature values (i.e. majority of the features falling in range of (0,+real_numbers)). So am afraid if I start adding -ve noises to these values (which I indeed performed earlier), many feature values will go out of their normal operating range, i.e. I will end up generating a simulated data which my model may never ever experience in future.

Hence, I am thinking I would restrict adding noise only as random numbers in range (0, +STDDEV). Ofcourse I would divide them by scale factor 'w' and then add them to the original feature values to get the simulated values.

Note: When I said that "adding negative noise to feature values results into -ve feature values", that negative noise already took the scale factor 'w' into account.

Please let me know if going for (0,+STDDEV) along with scale factor 'w', would still preserve the Gaussian property of the noise.

Thanks,

Shrey

Mehrnoosh Nobakht

Here is a useful link with example:

https://stackoverflow.com/questions/46093073/adding-gaussian-noise-to-a-dataset-of-floating-points-and-save-it-python