I want to know if have some way to combine datasets with a different number of instances into one so that the final dataset is not biased to larger datasets.

For instance: we have the dataset A, B, and C with 2000, 500 and 50 instances, respectively. If we combine into a D dataset, with 2550 instances, the result should be biased to A, with 2000 instances.

Since I would like to use statistical tests such as Wilcoxon signed-rank test, my idea is to replicate, sampled, the smaller datasets until they have the same number of instances as the largest.

Following the example, we would have the dataset D with 6000 instances, being 2000 instances of A, 4 * 500 instances of B, and 40 * 50 instances of C.

Does this make sense?

https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

More Vítor Mangaravite's questions See All
Similar questions and discussions