This is a weird question on a taboo topic.

I have been discussing some issues about data manipulation with colleagues. Some of them believe that considerable manipulation is done in statistical description of experimental data when unethical researchers want to "prove" their point with statistical analysis. This is made easier by the traditional practice of not publishing raw data behind statistical tests and data descriptions.

However asking to see the raw data is often prized as the ultimate test for veracity. My friends insist that there must be a simple tool, even in Excel, of generating random numbers that would be fit any given (plausible) description of mean +/- SD within an interval, which could be used to bypass such proof tests by giving a false superficial impression of data veracity. I particularly think that such a generated random numbers would not fit statistical tests perfectly, especially if the index values given were also manipulated. This comparison sounds like be an interesting way of using the same tool to double-check statistical data, and seems like it could be automated and even applied to random published literature as a (very controversial yet interesting) scan test.

This is an awkward idea that crossed my mind, and I got curious. I could not find any discussion on this and I find this relevant.

Maybe others here would know more about this?

More Eduardo Goncalves Paterson Fox's questions See All
Similar questions and discussions