I have some data set which i want to use to for forecasting using artificial neural network.How do I normalized the data set.Is there any matlab procedure to do it? Can I use excel spreedsheet with formula and then used the normalize data in matlab?
Using highly variable data for training ANN models may induce "overrating". So for practical reasons as well as making all the inputs at a comparable range normalizing or standardizing the inputs in ANNs can make training faster and reduce the chances of getting stuck in local optima. If you use Matlab you can use either mapminmax or mapstd as follows:
[yn, ps] = mapminmax(ymin, ymax)
MinMax or mapminmax algorithm: y = (x-xmean)*(ystd/xstd) + ymean;
[yn, ps] = mapstd(ymean, ystd)
Z-Score or mapstd algorithm: y = (x-xmean)*(ystd/xstd) + ymean;
Assuming your data (variable X) is organized in such a way that rows are the samples and columns are the features you can normalize it in Matlab by simply doing:
Xnorm = (X - min(X)) ./ (max(X) - min(X))
if you want everything between 0 and 1. Or (the way I prefer) you can normalize it with zero mean and one standard deviation.
If you are using MATLAB's NN toolbox then normalization, dimensionality reduction, and missing data are available. The choice of methods used to process the data is controlled by the input-output processing functions.
As dear Carlos said, you can use those formulas in MATLAB and normalize your data but I many time train a neural network by real data and normalized data and saw that real data was better than normalized! It seem because of same range of normalized data, the NN can't predict distribution of data easily. So I suggest to test both and then decide to normalize your data or not.
Behzad answer is really useful and I forgot this detail.
The only caveat of not normalizing data is that if the initial weights make the neural net saturate (due to poor scalling) it can cause the NN to never converge.
Using highly variable data for training ANN models may induce "overrating". So for practical reasons as well as making all the inputs at a comparable range normalizing or standardizing the inputs in ANNs can make training faster and reduce the chances of getting stuck in local optima. If you use Matlab you can use either mapminmax or mapstd as follows:
[yn, ps] = mapminmax(ymin, ymax)
MinMax or mapminmax algorithm: y = (x-xmean)*(ystd/xstd) + ymean;
[yn, ps] = mapstd(ymean, ystd)
Z-Score or mapstd algorithm: y = (x-xmean)*(ystd/xstd) + ymean;
There are several ways of normalizing data. I think the simplest is to force the mean to one. To do this you can execute: (assuming your data is in a variable called X)
X/mean(X);
This will preserve your statistical features such as the variance.
I performed this procedure in the following papers with good results:
2016, “Improved Shape Parameter Estimation in K Clutter with Neural Networks and Deep Learning”. International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 3, No. 7, pp. 3-13, España, ISSN: 1989 – 1660.
2016, “Implementation of an Algorithm for the Estimation of the Sea Clutter Distribution and Parameters”. Journal of Tropical Engineering, San José, Costa Rica.
2015, “A Neural Network Approach to Weibull Distributed Sea Clutter Parameter’s Estimation”. Revista Iberoamericana de Inteligencia Artificial, Vol. 18, No. 56, pp. 3-13, España, ISSN: 1988-3064.
I use excel for stadarization input data. There are some methods giving range of input. Linear and non-linear (vector, Manhattan, Weitendorf's, maximum and others). That is one of optimization od ANN parameter. Different stanadarization methods give different ANN prediction accuracy. "The influence of input data standardization method on prediction accuracy of artificial neural networks" will be published in two weeks time.