I am currently coding a neural network on weather data. So, I have only 570 training examples. But, the relation between output and input is highly non linear. How can I reduce overfitting on this data. I am using conjugate gradient descent with line search.