In general term, Bayesian estimation provides better results than MLE . Is there any situation, Where Maximum Likelihood Estimation (MLE) methods gives better results than Bayesian Estimation Methods?
I think that your answer may vary depending on what you consider as better results. In your case, I will assume that you are referring to better results in terms of smaller bias and mean square error. As stated above, if you have poor knowledge and assume a prior that is very far from the true value, the MLE may return better results. In terms of Bias, if you work hard you can remove the Bias of the MLE using formal rules and you will get better results in terms of Bias and MSE. But if you would like to look at as point estimation, the MLE can be seen as the MAP when you assume a uniform distribution.
On the other hand, the question is much more profound in terms of treating your parameter as a random variable and including uncertainty in your inference. This kind of approach may assist you during the construction of the model, especially if you have a complex structure, for instance, hierarchical models (with many levels) are handled much easier under the Bayesian approach.
The answer depends on the prior information as highly informative prior leads to more precise posterior summaries than less informative prior. The MLE can be more precise if the prior is flat (noninformative) as Pedro Luiz Ramos mentioned.
I believe that, in general terms, "maximum likelihood methods" have associated with them better and more extensive ways of checking model structures and assumptions than do typical Bayesian estimation methods. Thus, one might start an analysis in a maximum-likelihood framework for model-development while deploying the "useful" features of Bayesian analysis for final conclusions.
Maximum Likelihood Estimation (MLE) should always give better results than the conventional Bayesian method because the latter is actually flawed as demonstrated in Preprint A new modified Bayesian method for measurement uncertainty a...
The conventional Bayes Theorem in continuous form states that the posterior distribution (PDF) is proportional to the product of the prior distribution (PDF) and the likelihood function. That is:
posterior PDF ~ prior PDF x likelihood (1)
In the case of no prior information, a flat prior should be used according to the common sense that the statistical inference (e.g. measurement uncertainty analysis) should rely on the current measurement (data) itself and also according to Jaynes’ maximum entropy principle. If a flat prior is used, formula (1) becomes:
posterior PDF = standardized (likelihood function) (2)
Apparently, formula (2) is wrong because a likelihood function is NOT a probability distribution. Fisher (1921) stated, “… probability and likelihood are quantities of an entirely different nature.” Edwards (1992) stated, “… this [likelihood] function in no sense gives rise to a statistical distribution.” Thus, not only formula (2) conflicts with the common sense, it is also methodologically flawed. However, Bayesians usually do not accept flat priors as non-informative priors. Instead, Bayesians often use non-informative priors such as Jeffreys priors. However, the validity of the Jeffreys priors has been an argument even among Bayesians. D’Agostini (1998), a leading proponent of Bayesian methods in particle physics, argued “…it is rarely the case that in physical situations the status of prior knowledge is equivalent to that expressed by the Jeffreys priors, …”.
I recently discovered that formula (1) (i.e. the conventional Bayesian method) is flawed because it violates “the principle of self-consistent operation” (Huang 2020). I provided a rigorous derivation of a correct formula based on the frequentist-Bayesian transformation rule and the law of aggregation of information (LAI). In the light of the frequentist-Bayesian transformation rule and the LAI, the frequentist and Bayesian inference are virtually equivalent so that they can be unified, at least in measurement uncertainty analysis. The unification is of considerable interest because it may resolve the long-standing debate between frequentists and Bayesians.
In fact, Eq. (1) is not the original Bayes Theorem. It is known as the "reformulated" Bayes Theorem by some authors. The original Bayes Theorem is merely a statement of conditional probability (distribution); it is a self-consistency operation because it operates entirely on probability distributions. Please refer to the updated preprint, Huang (2020).
D’Agostini G 1998 Jeffreys priors versus experienced physicist priors: arguments against objective Bayesian theory Proceedings of the 6th Valencia International Meeting on Bayesian Statistics (Alcossebre, Spain, May 30th-June 4th)
Edwards, A W F 1992 Likelihood (expanded edition) Johns Hopkins University Press Baltimore
Fisher R A 1921 On the ‘Probable Error’ of a coefficient of correlation deduced from a small sample Metron I part 4, 3-32
Huang H 2020 A new Bayesian method for measurement uncertainty analysis and the unification of frequentist and Bayesian inference, preprint, DOI: 10.13140/RG.2.2.35338.08646, available on ResearchGate: https://www.researchgate.net/publication/344552280_A_new_Bayesian_method_for_measurement_uncertainty_analysis_and_the_unification_of_frequentist_and_Bayesian_inference?channel=doi&linkId=5f7fd8a5458515b7cf71d5ec&showFulltext=true
In addition to the above responses, the Bayesian estimator is highly related to its prior distribution and an erroneous assumption leads to weak estimations.
It is noticeable that if a model or method be better than the other in all situations; we said that the latter is not admissible, and we must remove it from the options. The names that are still in statistical literature are admissible. Therefore, they are better than any competing model in some situations. Good luck. Babak Jamshidi