I am studying the performance of deep learning (DL) models toward abnormality detection in chest X-rays. Due to sparsity of data, I augmented the data using different augmentation strategies including: a) traditional augmentation methods including Gaussian smoothing, unsharp masking, and minimum filtering; and b) Generative Adversarial Networks. Contrary to the existing literature, I find that the DL models showed promising results with traditional augmentation methods (that i have mentioned herewith) than with GAN-generated synthetic images. What brings this performance difference?