Hi,

I'm writing a paper about Data Augmentation and I'm looking for some way of increasing the size of a dataset. I'm already aware of the techniques used for images (transformations, PCA, blurs, etc.) but I'm having trouble finding augmentation techniques for classic numeric datasets such as Iris for example.

I already explored some obvious methods which are the following :

1. Uniform Random Generation : This really naive method consists of creating a new instance based on the min and max of the existing ones, the value of each feature is generated randomly with a uniform probability. (The mins and maxs are calculated from the values of the concerned feature of the concerned class each time)

2. Normal Random Generation : Same as Uniform but the probability is now a gaussian curve. Which is of course less naive since generated value fit the initial data distribution.

3. Adding Noise : This method is a little bit different since it consists of cloning initial values, but each time adding some noise to it. This methods aims to strengthen the models and prevent overfitting.

I'm kinda new to ML and data augmentation and these 3 methods can look pretty obvious, i was wondering if anyone had an idea of different methods/algorithms to do such a task.

Thanks for your advices.

DP

NB: Text Augmentation is out of my project scope but if you have any info on that, please share !

More Damian Petroff's questions See All
Similar questions and discussions