Is it reasonable to use half your dataset to carry out exploratory analysis leading to specific hypotheses, and then test these using the other half?

More Jim Uttley's questions See All

How do I replace a file with a more recent version of a paper that was uploaded to ResearchGate?

The paper in question is "Interpolation of Nitrogen Fertilizer Use in Canada from Fertilizer Use Surveys". This paper was very recently published by Agronomy (MDPI). Agronomy has, in the last day...

07 August 2024 9,934 3 View

How do you delete a duplicate pdf for the same paper on ResearchGate?

The first pdf file I uploaded had an error. So I uploaded an updated, corrected pdf of that paper with a different pdf name. I dpon't want the old copy to be download or read.

07 August 2024 9,508 1 View

Is there a way to monitor shaking in a shaking incubator?

Hello, We are looking for a way to monitor the shaking of suspension cells in a shaking incubator (culture sizes range from 2 ml in 6-well plates up to ~1L in 2.5L flasks). We would like to ensure...

19 March 2023 2,331 0 View

Why is there a vibration wave of S parameter at low frequency in CST?

I'm simulating a Vivaldi antenna array. There is a vibration wave of S parameter at low frequency. It can't be removed when I select the adaptive mesh of time domain solver. Actually, it doesn't...

12 February 2023 5,932 2 View

Why is the S11 calculated from Z matrix different from S parameters in CST?

I simulate a structure like the figure shows. A discrete port "port1" with 50 ohm is added between the gap. The boundary condition is Unit Cell and plane wave excitations are Zmax(1) and Zmax(2)....

29 January 2023 7,036 3 View

How can I start to write a review paper?

I am a newcomer in researching. I want to start this by write a review paper in anyone of the topics including renewable energy, automation or automobile sectors. How can I start this? I'll be...

20 October 2022 1,538 6 View

Can anyone recommend an antibody that will detect mCherry but not DsRed in PFA fixed brain sections?

I think aTakara/Clontech one will, but it is expensive to try it out so am looking for a more cost-effective trial!

13 October 2021 9,940 2 View

What organic acids are available for removing various scales (CO3)?

The attached paper describes and organic acid used for removing composite field scales. Any idea of what this product is?

03 May 2021 3,446 7 View

Is there a way to mark a slide to note the position of the specimen?

I would like to mark the position of a small translucent specimen (a live Drosophila brain) on a depression slide. The depression is coated with sylistic to better adhere the brain tissue. The...

21 December 2019 564 2 View

Is there a good web browser that is primarily European in origin?

There is now much commentary on data collection from browsers, search engines, and social media. This is addressed in the movie The Creepy Line, (available from Amazon or iTunes) which is...

24 May 2019 1,826 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

May members post flyers about opportunities to present at a conference? If so, where to post?

May members post flyers about opportunities to present at a conferehttps://veraeducation.com/nce? If so, where to post for the Virginia Educational Research Association? https://veraeducation.com/

08 August 2024 4,585 1 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Usman Rashid

Hi Jim

This is a very interesting question and I would also like to know the answer and read the opinion of different experts.

In machine learning literature, it is a common, well established and acceptable practice to divide the data into subsets such as training, cross-validation and test sets. The model is trained and fine tuned on the training set and its effectiveness is shown on the test set. The underlying null hypothesis which is evaluated on the test set is that the trained model is not better than making a random guess.

Is it possible that you can also use the same approach for your collision dataset? Such as, propose a model on the first half which predicts some specific statement about the unseen future collision and test it on the second half.

Jim Uttley

Hi Usman,

Thanks for the response, it's interesting to know this type of data-splitting is used in machine learning. This is essentially the kind of thing I had in mind. My plan though would not be to use data from period x to then predict period y, but to take a subsample of data from periods x and y, explore that data, then make specific predictions or hypotheses which I would test using the data from x and y that I did not originally sample. Hope that makes sense! I'm trying to get a sense of whether this is a sensible thing to do or not, and any precedents for this type of approach (e.g. machine learning, as you've pointed out).

Thanks,

Jim

In such a situation, I would call my research an exploratory study. Instead of focusing on hypothesis testing, I would focus on illustrating the different relationships in the data with descriptive statistics, fit regression models and report estimates with confidence intervals for different variables.To evaluate the goodness-of-the-fit, I would use partial R-squared for variables and AIC information criterion for model comparison instead of relying on p-values.

Such an analysis would generate a rich set of useful hypothesis for the future research. But as I said earlier, it's an interesting question and I want to know more.

Rajaram Bhagavathula

Hi Jim,

I have used the approach that you mentioned (splitting the data set) when I was checking the predictive performance of a modelling technique used. I used it in instances where I had to make predictions on whether a driver should get a warning depending on their speeding/acceleration profile when approaching a signalized intersection. In these cases I was working with large amounts of data so splitting the data set did not majorly affect the power of my sample.

In my opinion, as long as you specifically the state the goals of your analysis be it confirmatory or exploratory before presenting your results/conclusions it should be okay. I agree with Usman Rashid about calling your research exploratory or confirmatory. I can also see a situation where you can run a hypothesis test (look for differences) on the data and see what factors have major affects on your dependent measures and then run a exploratory analysis for by splitting data set as a supplementary analysis and then stating that more data is required to substantiate the evidence from the exploratory modelling. This essentially lays the foundation for a future analysis.

I feel like there is no wrong answer as long as you clearly specify the reasons for selecting a specific analysis prior and mention the limitations of your analyses.

Hope this is helpful.

-Raj