In all the Emotion Recognition through EEG research papers for DEAP Dataset, I noticed that the 60-second data was divided into many small chunks and then feature extraction methods were applied on that chunk. So we generally get several hundred values after feature extraction from that single 60-second data (DEAP Dataset has 1280, 60-second data with 128Hz sampling rate).
Now while training the model, all the research works just shuffled the data and divided it into training and testing dataset. So now for example let's say from a single 60-second data we got 100 values after feature extraction, so out of them 60 go-to training dataset and 40 go to testing dataset. So, now would this be a fair evaluation while training it on a classification model. As a part of the video used for evaluation is in the training dataset?
And, when trying out the same model such that the whole values extracted from a single 60-second data goes to either training or testing dataset and no splitting happens in it, then the accuracy drops significantly when compared to the above case or sometimes doesn't even learn anything.