First, How much data you have? Given a small dataset an MLR might perform better than ANN. NN requires relatively large number of dataset.
Second, What are the associated training accuracy? If you have training accuracy very high (>99%), then your network might be overfitted and thus perform poorly for test samples.
Third, If your model is performing as expected in one language than other, then I would recommend you to check your code, implementation of the network structure carefully.
Obviously, NN can 'remember' almost everything but lack of predictability important for good AIML models while properly built linear regression model y -= ax+b is excellent in predicting, isn't it? Overoptimization of NN is a typical problem solvable by regularization techniques, decreasing (!) the number of layers and applying it only for 'important features'. You can use Random Forest first to identify feature importance, and then you use NN on top. This paper Article Atomic Machine Learning
What is the size of your data? Is it big data or a small one? MLR is a parametric model which assumes certain criteria. Neural needs big data as it uses multiple and many parameters. That may be one reason for this difference. Pls read more about parametric vs non-parametric models
thanks for your valuable time Sergei Eremenko , ill definitely follow the suggested article. @anura
@Anuraj Nayarisseri the sample size of my data is around 1400, i think its too small considering the complexity of my model with 100 independent variables.
Jeevan C.: There is a method in R - "accuracy", that is typically used to calculate model accuracy. Certainly, with 100s of independent variables and 1400 data points, model might overfit on the training data.
Also check https://www.researchgate.net/publication/222679730_A_comparison_of_neural_network_and_multiple_regression_analysis_in_modeling_capital_structure
Jeevan C. yes right. For 100 variables first, select the important and significant variables using Lasso or Ridge or Random Forest. Once you select the features then build the models. Alternatively, use PCA to reduce the dimensions before building models.
Sometimes there are datasets that have very simple patterns for the model to recognize and do the modeling. So, if you use a complex model, there is a good possibility that you get a poor result. This situation often happens with the deep model when someones trying to fit thousands of parameters of a model to a dataset that could be modeled with a classic machine learning algorithm.
On the other hand, for the training of an ANN-based model, you should have a handy number of samples and if your model has many inputs, you should do some dimension reduction such as PCA ar MNF of more advanced ones like Boruta algorithms.