i want to use GAN data augmentation for the purpose of semantic segmentation. origninal images are the real RGB images where the Ground truths are the mask of the area to segment.
It sounds like you are dealing with a chicken and edge problem. In order to generate good enough GAN results, you need to train the GAN network with good enough semantic segmentation performance.
I am a little doubt about this approach could bring significant gain for semantic segmentation. It is because GAN will generate "realistic" image but not "Real". That means something look OK for the human eye but may not exist in the real world.
In addition, you need GAN to generate couple image (RGB and corresponding label data) which is more difficult than generate single image. Beside, there is potential miss-match between that pair of generated data.
If you are dealing with small data set problem, then how can you have GAN that generate data suitable for your data set?
In summary, I think it is quite difficult problem. Technically, you are possible to do so, but whether to have performance gain is questionable. One benefit I could get is that if you define the loss differently for real and generated data set, it might improve the robustness of your method a little bit.