I have a statistical question:

For a research project, I collected data (i.e., I recorded sounds) and a master student is processing it (i.e., measuring each sound), but it takes a lot of time to process the data. For his master thesis, he used a sample of X sounds, and conducted preliminary stats.

Now, we want to continue the work to publish it, and wonder whether we need to process more data and if so, how many, so that i) he does not spend too much time coding "unnecessary" sounds and ii) we have reliable results.

I think that a power analysis could be the answer, but I read everywhere that power analysis must be only conducted on pilot data. Can we consider that the X sounds he coded are pilot data? If so, can we include them in the final article if we run a power analysis on this sample size?

In other word: can we do a power analysis during the study to estimate when he can stop collecting data?

Importantly: the stats he conducted for his Msc thesis are not the ones we will keep in the article because these are simple and probably a little wrong (he was in a rush to finish before the deadline!) and I want to run more elegant tests for the final article. So we are completely blind to the results, we have no p-value yet or anything. Just to clarify that we are not trying to p-hack the paper! :)

Many thanks for your input!

More Mélissa Berthet's questions See All
Similar questions and discussions