As in Random Forest machine learning, the predictions are within the range of the predictor variables and it is recommended to use the forest model within that range. Is that also true for ANN which works on scaling of the input variables?
This is not exactly the situation. Briefly, the variables number (and the type) plays a role as well as the number of hidden layers. Note that having a high number of variables doesn't mean that model would have high performance.
Thanks for your quick reply. I think I was not quite clear in my question. Let me clear it little bit. Lets assume you are using Random Forest with 8 variables (x1...x8) to predict y. Now suppose that the x1 predictor variable in the test data ranges from 1 to 200. In that case the use of the forest model on any test data would be recommended to have the x1 variable within the same range in which the model was trained. Please correct me if I am wrong.
I believe this is not the case in case ANN as we scale each variable from 0 to 1. Am I right? i would appreciate your reply. Thanks!
Your question is about data normalization. In many cases, data normalization can result better outputs. you should run the model using the same range of data training. If the test data is out of range, make sure that it's not too different (close to training data).
Thanks for your quick reply. I think I was not quite clear in my question. Let me clear it little bit. Lets assume you are using Random Forest with 8 variables (x1...x8) to predict y. Now suppose that the x1 predictor variable in the test data ranges from 1 to 200. In that case the use of the forest model on any test data would be recommended to have the x1 variable within the same range in which the model was trained. Please correct me if I am wrong.
Answer
For any machine learning algorithm (including RF and MLP), the training and the test sets have to be compatible. Attribute like x1 has to be available in both training and test sets but with different number of instances/examples. This depends on the method you are using for evaluation/testing.
I believe this is not the case in case ANN as we scale each variable from 0 to 1. Am I right? i would appreciate your reply.
Answer
The initial weights are randomly assigned in ANN, this is usually in the range [-0.5, 0.5].
My output layer (i.e. the response variable) also contains negative values. Does it matter during scaling? Or, I should make all values in the response variable 0 or positive by adding the lowest range of the variable.