Being pragmatical we usually solve the overfitting problem by just specifying an additional criterion (regularization/prior) that is traded of against fitting the training data. Often a pretty crude regularization (ridge or L1 i.e. Gaussian or Laplacian prior) does a rather good job. And if we happen to have enough data the problem becomes even less critical.
Another way would be to define a regularization/prior using hyperparameters and learn these to. This may be more robust in the case of parameter misspecification but effectively only shifts the problem to a higher level.
If this is not enough we can validate our learning procedure using techniques like cross-validation, which is a means to adjust the regularization/prior. But this may be computationally expensive.
In general there is no "solution" to the problem since we simply cannot know the correct prior distribution over models. So we are forced to choos a prior (whether flat or Gaussian of whatever) based on intuition. And if people say they wouldn't make any assumptions on the prior they are simply not aware of the implicit assumptions... ;-)
I agree with Matthias. The learning of parameters is not trivial but helps very much to handle the overfitting. Further, it improves also the results achieved with a system.
Of course, introducing a momentum or on the other side a kind of annealing may help as well.
In general, Robert is right, there is no simple solution for this issue.
As indicated already by several authors there is no simple solution. Here is a a bit of a more detailed explanation of the how's and why's:
Overfitting occurs when a model starts describing noise instead of the underlying uknown function we target to approximate. The main reasons are:
1) Limited data wiith respect to the complexity of the model. The VC dimension is a very good measure that decribes how complex a model is in terms of active degrees of freedom (related to how many data the algorithm can shatter) and based on that there are equations that give you a pesimistic estimation of the observations you require in order to be "probably approximately correct". This is explained in several textbooks and some intuitive introduction is also available for example in the lessons here:
http://work.caltech.edu/telecourse
2) The data are especially noisy and outliers are present.
Regularization methods, which are one of the general purpose tools for reducing overfitting are usually taking the form of a penalty to complexity, either as a restriction for smoothness or (as indicated in other answears) bounds on the vector space norm. The idea is to pay a small penalty on how well we do in the training sample in order to have a significantly better chance to generalize successfully "out of sample" (i.e., in unknown data).
Beyond regularization, other things can be helpfull for especially the second part of the problem (outliers) . For example the effect of methods such as bagging in reducing the impact of outliers (compared to a simple max margin approach) is explained very intuitively here:
1. use regularization (for optimization-based classifiers). Here, you should take some time to set an appropriated value for the regularization parameter;
2. increase the amount of samples in the training set;
As per the researchers you can avoid over-fitting by applying feature selection algorithm or ensemble learning. I'll suggest that better to go for ensemble learning methods either stacking and voting. These two techniques are having their pros and cons..