This can come from the dimensionality of your data. This phenomenon is largely describe and also name "curse of dimensionnality". this phenomenon has always been a problem.a high dimension often prohibits the error minimization algorithms to converge properly.
This phenomenon is also known as Hugues phenomenon!
i agree with Ludovic , that the high number of dimensions ( input and output variables) is one reason. Beside, The Percentage size of validation which is used to terminate
training must be set carefully ( 0-100). And value of threshold , or the number of epochs may be very large.
long training times, of even several days, is not unusual for Technical Neural Networks. But as the others have meniontend before that depends on a lot of circumstances.
So without further information about your set-up it is hard to determine if there is a real problem, or just the "normal" training time.
Can you provide more Information?
Type of Network, number of layers, number of neurons, size of data set, input dimension, regression task, or classification, ...
What do you mean by "...but yet to settle down?"
are the weight changes oscillating?
Is the learaning curve going up and down?
And, beside that, have you checked the TNN implementation?
The links you had shared were really helpful... yes my dataset has some 100 dimensions
Jabar H. Yousif : Thank You Sir for your time,I am actually using 10 fold cross validation .
Nils Goerke: Thank You Sir for commenting....Actually I am using weka tool and I have not tuned any parameter. My dataset has some 4 lakh records, each record of about 100 dimensions. here I am attaching the screen shot so that you can better find out if there is some problem with setting the parameters...
Are you sure these 100 features are relevant? Are they preprocessed? Have you randomized entries so that there would be different classes in sibling rows? Have you done scaling (0,1 in case of Sigm activation function and -1,+1 in case of Tanh act.function.) - check weka docs what activation functions they need.
Again what is your outputs? They can be coded in many ways but this is usual approach
Class -> Expected ANN Output
1 -> 1 0 0 (or +1 -1 -1 if Tanh act is used)
2 -> 0 1 0 (or -1 +1 -1)
3 -> 0 0 1 (or -1 -1 +1 respectfully)
As it was already mentioned - check you training an validation curves how they are behaving. You should get graph like this and apply early stopping (or at least set appropriate max epochs count to get similar result to first image http://stats.stackexchange.com/questions/131233/neural-network-over-fitting )
One way to resolve this kind of problem is to apply a dimensionality reduction method in order to reduce the data dimension while preserving the essential information.
In this sense, it already exist different approaches (linear vs. nonlinear).
Did you try to rescale data to -1, 1 range? This can sometimes help. I'm working under this problem: why some computational intelligence algorithms, when working with multidimensional data - works better when rescaled to this range (instead (0,1))
Of course, the dimensionality, as mentioned above is probably the main problem...
Did you get any error or even the classifier model in the output? If not, then probably there is a problem either with the algorithm or with the heap size.