I’ve been using Bayesian optimation for finding optimal hyper parameters that varies for each cross-validated fold? How do you settle with the optimum for the entire dataset?
in a Bayesian technique, the prior distribution is updated based on the data, and the posterior should reflect that. you can draw many samples from the posterior and use their mean, and other statistics.
Cross-validation, at its heart, is a method for estimating the generalization ability of your learning algorithm and model; this is usually done by averaging the estimated generalization errors over each cross-validation repetition (fold). This is what cross-validation is for, strictly speaking.
It is true that the method, as a side effect, also provides several parameter estimations, and so it is very tempting to find a way to use them. It is therefore a common misuse of cross-validation to use it for estimating parameters. But cross-validation does not say anything about how to combine these estimations (you could take their average, their median, filter outliers first, etc. etc., and each approach would raise issues in pathological cases).
This being said, I see two options:
- first, only use cross-validation for its intended purpose, i.e., estimate the generalization ability of your method. If this proves to be small, then you can proceed with parameter estimation with a classical method, using the whole data set.
- second, try to use cross-validation also for parameter estimation. This is "uncharted, unprincipled mathematical territory", but can be done if you verify that you are not in some pathological case. For instance, if parameters from all folds nicely fall into a small region of parameter space, then averaging them maybe is safe; if only a few folds provide aberrant parameter values, maybe you can remove them and then average, etc. Proceed at your own risk!