I am currently envisioning a deep-learning model that classifies pulse waveforms (normal vs. pathologic).

At this time, the no. of pulse dataset is about 2,000,000, and the balance of normal and pathologic label is distributed about 3:1.

Stratified 10-fold cross-validation is currently too time-consuming and hard-burden for my computer system.

So, it's my question,

When creating a deep learning model with about 2 million data, is cross-validation necessarily required?

I would like to ask one question more.

I want to predict the patient outcome by the pulse-classification results from the deep-learning model to further evaluate the applicability of my deep learning model (2 million data were extracted from about 300 patients).

Does this approach make sense?

Patients with worse outcomes inevitably have many pathologic pulses, thus, I am concerned that the deep learning model would commit cheating.

I'm still very confused because I have very few experiences with machine learning studies in the medical domain.

Thank you for your help.

Sincerely,

More Young-Tak Kim's questions See All
Similar questions and discussions