I'm a graduate student new to deepfake detection, working on my project. I achieved a 99% accuracy on a subset of the DFDC(deepfake detection collection) dataset using CNN models, but encountered a significant drop to 40% when testing on another unrelated dataset. As a newcomer, I'm unsure if I should continue with the second dataset or address this drop in performance differently ? Your insights would be greatly appreciated.