set of 100 timestep data to train the network...with set of 5 variables as input and 3 variables as output...give me suggestions....of the 11 days that i am validating the results, some of the predictions are deviating from their targets to the tune of almost 20%...help me out...
Network architecture is 5-12-12-3 and i used 2-2 set of delays for input and feedback regressors
I would not recommend using GA for optimizing a neural net, it will take too long and won't be able to fine tune it.
How did you choose the 5-12-12-3 architecture? You have more parameters than examples, so you should use regularization (e.g. weight decay) or a simpler architecture.
Since you have few examples, don't worry about stochastic gradient descent, you should use a second-order optimization method. I highly recommend Levemberg-Marquadt and Conjugate Gradient.
If you have Matlab, the Neural Network toolbox will offer all those options to you.
i am already using LM algorithm and i tested the neural network for arriving at the mentioned architecture. How do you conclude that i have more parameters than examples? i have 111 datasets...from which i am using 100 datasets to train the NN and 11 to validate it. Since i need to duplicate the networks, the toolbox does not provide adequate control over the structure of the network.
How to implement weight decay or bayesian regularization? Please share your work if you have prior experience.
Atiya, if you have a 5-12-12-3 fully connected network, this means you have 5*12 + 12*12 +12*3 + Biases connections, so you have at least 240 parameters! With only 100 training examples, this means you have a very overcomplete system (in theory).
Weight decay is very easy, you just add a L2 (Euclidean norm) penalty on weight values on the objective function, with appropriate scaling (this metaparameter can be found with cross-validation). If you're using Matlab (or other NN library), that's probably an option. If you're doing by hand, just re-derive the gradient with this additional term.
Another easy way to regularize is to use dropout. It's a recent technique that has been very successful with deep neural networks. You just hide a percentage of hidden units (usually around 50%) every training example, and then you multiply the outgoing weights by after training. This way the neurons are forced to learn independently.
I don't any easy way to implement Bayesian regularization.
I disagree. The minimum number of examples is twice the number of hidden neurons, no matter what is the number of features. There is a paper by Muller that proves it, and my own experience, with many data sets, is confirming it.
I believe that the proposed ANN architecture is too complex - my experience with my training algorithm is that one hidden layer with 5 hidden neurons is sufficient for most practical purposes.
I am ready to prove it if Atyia can send me his data ( to [email protected]).
I may send you the data but i need solid proof of what you are saying because i need to develop a model for it....and quickly...so please assure me that it would work...i am ready to share the techniques i am using...