Dataset size and the need for cross-validation (medical domain)?

More Young-Tak Kim's questions See All

Why I can't see any band in SDS-PAGE?

Currently, when I run SDS-PAGE, I don't see any bands at all, even though I used the same material just a day ago and it worked fine.... In our lab, we dilute the 10X running buffer to 1X and...

06 August 2024 5,373 2 View

Question about water vapor uptake of metal-organic frameworks?

I'm working on the atmospheric water harvesting, and i synthesized MOF by adding some materials into precursor and changing the conditions. (i followed Hydrothermal method) Actually, if MOF is...

24 July 2024 1,515 0 View

Is this a scam?

UK's Well-online Science Press sent an invitation to be a co-author and offers a payment of $30 per review.

22 July 2024 1,695 8 View

When you express a protein, why do we express not only the domain we want, but also the protein around it?

I want to express STK4, and I've searched the paper for reference. When I check the protein kinase domain sequence for that kinase on Uniprot, it's 30-281, but the paper expresses the protein...

20 July 2024 4,951 1 View

Is is possible to calculate the activation energy of Redox Reactions Using Gaussian?

Hi everyone, I'm working on calculating the activation energies for some redox reactions using Gaussian, Here are the reactions I'm interesting: Py•−+ 3O2 → Py + 3O2•− Py•− + 1O2 → Py + 1O2•− Is...

18 July 2024 4,418 3 View

The reason for breeding Loxp mice with FLPe mice?

I am planning to use CD4 Cre mice crossed with Loxp mice. However, the laboratory from which I am trying to get the loxp sperm says they have "the product of the EMMA ko crossed with the FLPe to...

16 July 2024 7,475 1 View

Why is serum-free media used when making cancer cell conditioned media?

I am preparing cancer cell conditioned media to study the effects of substances secreted by cancer cells on other cells. In some papers, conditioned media is prepared using FBS-containing media...

11 July 2024 7,063 1 View

Stable cell line generation: GOI is detected by qPCR but protein is not expressed?

Hello, I'm generating stable cell lines by transfecting plasmid DNA with lipofectamine 3000. The cells went through antibiotics selection for 3 weeks (polyclonal selection) and when I analyzed...

11 July 2024 4,812 5 View

NH3/NF3/H2 Remote plasma Oxi/nit selective etching mechanism help ??

Hello, I am an Etch Engineer. I am performing etching with high T_ox/Nit selectivity using remote plasma. Etching plasma is formed using only NH3/NF3/H2. 1. I wonder if it would be advantageous...

11 July 2024 8,115 2 View

Comparison of tissue to serum MS data?

Hello, So I have mouse brain tissue MS data from one of my predecessors and I have human serum data from the hospital that I collaborate with. We are trying to find markers for the diagnosis of...

11 July 2024 8,484 0 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

The aqueous fraction of the hydroethanolic extract is showing the presence of palmitic acid. What is the mechanism responsible ?

Palmitic acid presence in aqueous fraction

05 August 2024 8,624 4 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Which solvent is better to dissolve with secondary metabolites extracted from fungi?

I work on MCF7 cell cell for anticaner purpose and I wa to do drug preperation the drug ( secondary metabolites extracted from Aspergillus) My question which solvent is better with these secodary...

03 August 2024 4,725 2 View

Victor Henrique Alves Ribeiro

Dear Young-Tak Kim

Cross-validation is remarkably one of the best tools to assess model generalization.

However, if the computational cost is limiting your application, you could employ hold-out validation. Especially since you have a lot of samples.

In one step from a recent paper, I employed a 70% 30% holdout validation:Article Ensemble learning by means of a multi-objective optimization...

It would also be interesting for you to separate the patients from training and testing sets. You could use data from 200 patients during training and 100 for testing. You should still try to maintain the class ratio though.

Regarding your last question, it is difficult for me to answer without further knowledge from the data.

Muhammad Ali

Related to your query, I suggest you follow:: https://www.coursera.org/lecture/industrial-iot-project-planning-machine-learning/segment-10-cross-validation-9Pf98

https://medium.com/next-gen-machine-learning/types-of-cross-validation-in-machine-learning-8bd33bf3e12f

Aki Koivu

If you have a total of 300 patient observations, I would highly recommend cross validation regardless of how much data you have per patient. As Victor Henrique Alves Ribeiro suggested, holdout validation is a good choice.

Is your pulse data periodic, in other words are your features time-dependent? If so, try techniques such as Fourier transform to reduce the amount of features per patient (feed the frequencies as features to your deep NN model).