I've normalized training data using mean and standard deviation of training data. Now in testing phase, should I use mean and standard deviation of the test data or train data?
This is all dependent on size of data sets & whether both train and test are equally representative of the domain you are trying to model. If you have thousands of data points and the test set is fully representative of the training set (hard to prove) then either method will be fine. If using a small but representative test data set then normalizing using the training parameters only is best as sampling errors may negatively bias the predictions. If the test is not very representative of the training set then you are comparing apples with oranges and should think again about your sampling procedure.
If, and only if, you are not trying to produce a generalized predictive model but are instead trying to understand the underlying mathematical structure of the given data set then splitting the data after normalization is reasonably safe. In this case the MSE (for example) is only used as an internal comparative measure for optimizing model structure. As this internal measure is relative there is no worry about absolute MSE values, you are just concerned about over-fitting to noise.
Yes use the Train data mean and stdv. Consider the scaling process as part of the model generated by the training data; Thus the test data is testing both the generality of the model combined with the pre-processing.
Often people will scale all the data and then split into train/test data sets. The testing will then validate the model alone, which may be useful if your aim is not to produce a predictive algorithm but to understand the structure of the data (i.e. important variables).
I'd usually follow the training data based approach described by David. However, if testing data is expected to strongly vary from the training data, it might be also interesting to use the test data for normalization. This might slightly go into the direction of domain adaption. You should note, however, that the applicability of the latter approach highly depends on whether your problem setting admits such a treatment (e.g. often in classification, you only have one test point at a time at your disposal, which renders the test mean approach inapplicable).
This is all dependent on size of data sets & whether both train and test are equally representative of the domain you are trying to model. If you have thousands of data points and the test set is fully representative of the training set (hard to prove) then either method will be fine. If using a small but representative test data set then normalizing using the training parameters only is best as sampling errors may negatively bias the predictions. If the test is not very representative of the training set then you are comparing apples with oranges and should think again about your sampling procedure.
If, and only if, you are not trying to produce a generalized predictive model but are instead trying to understand the underlying mathematical structure of the given data set then splitting the data after normalization is reasonably safe. In this case the MSE (for example) is only used as an internal comparative measure for optimizing model structure. As this internal measure is relative there is no worry about absolute MSE values, you are just concerned about over-fitting to noise.
David, what would be a good test of the test data not being representative of the training set? For example if a point is just outside the +/-1 range after normalization, would I call it an outlier?
As you said the test is not very representative of the training set I think the problem is that I'm comparing apples with oranges. Because the problem is a one-class classification and my training data contain only one class(Normal samples) of the data and the test set is containing two class!(normal and abnormal samples)
I appreciate your advise. Maybe it's just me but I think If I normalize the whole test set separately(with test mean and stdv) the final system won't be online. Because in real world problem I don't have all test set and the decision should be taken based on one test instance. isn't it?
Yes I think my problem is that the testing data is strongly vary from the training data. I'm faced with a one class classification problem. I've used Min-Max normalization but it didn't work properly for my problem so I need to move to Z-score normalization. But I'm wondering how I should do this.
concerning any scaling/standardization/normalization preprocessing factors:
My default choice is to first find out, if there is any reasonable domain-specific knowledge for the scaling. Otherwise, I determine any(!) pre-processing parameters on the training data (or current crossvalidation fold) and apply it to the test data (test fold).
BTW, you might encounter situations, where your test data is available point by point only. In this situation you don't have much of a choice than to use the factors determined on the training data. Surely using the factors determined on the whole data can lead to severe overfitting.
Is it possible to do feature normalization with respective to class. EX: 10×10 data matrix with two class. Each class of size 5×5. Now normalize 25 features of class 1 and 25 features of class 2 separately. Is this process acceptable.
I had similar situation once. I realised it after the model is developed and I started to deploy it in the application for real-time scoring. The application was designed to predict the profit margin for the unseen/new project. When user enters the project details, the idea of mean/standard deviation of the test data which is just a single record, really didn't make sense. I started with the mean and median values of the training data. However, the application was programmed to re-evaluate these values as soon as new data is entered. In case, the difference was huge, the models need to be refreshed.