I have a dataset with a lot od financial indicators which have some outliers, also extreme values. I cannot remove them because they are crucial elements im next step (identifying outliers using DBSCAN). I would like only to reduce dimensions. I used autoencoders. I splitted my dataset on train and test set. I used normalization MinMaxScaler (my activation is sigmoid) on train set and copy values on test set (with clip=True to values which are out of range). I have a trained model with nice MAE value. And I would like to put my model on my full dataset, which combine records with train and test set. And there is a problem. I cannot using my normalization from train set because it was strictly depend on min and max values which are in train set. In next step od my analysis, clustering, it will cause that variables for which min or max values are not on train set will have high values. Clip = True wil cause that values will be distorted. Any idea?
I tried to normalize my full dataset with parameters from my full dataset. Next I put my model and the MAE is also great. Can I leave it at that knowing that probably it is great, but not well, option?