There is a clear distinction between exploratory and confirmatory research. In exploratory research, we explore our data to discover interesting trends or associations, perhaps by analysing it in multiple different ways. However this multiplicity of analytical approaches mean we cannot control the Type I error rate, so such exploratory research cannot have evidential value. Confirmatory research on the other hand involves making specific, well-defined predictions that are directly tested through the research. The paper attached to this question, originally published by de Groot in 1956 and translated by Wagenmakers in 2014, gives a better description of this distinction between exploratory and confirmatory research.

I have a large set of secondary data (collision records for a country, going back for 20 years). My question is whether it would be ok to carry out both exploratory and confirmatory research using this dataset. First you would randomly select half the data and use this to carry out the exploratory analysis on. This would lead to the development of specific hypotheses which would then be tested using the other half of the dataset that was not involved in the original exploratory research.

Any comments or thoughts are gratefully received.

Similar questions and discussions