It strikes me that likelihood-based approaches are prone to the similar dichotomous decisions as traditional hypothesis testing. E.g. In the environmental sciences we now seem to accept that empirical models with delta AIC
I've never heard this discussed formally in class, so the following is a product of my meandering reading/coding experience. The extensions of likelihood methods are useful to relax assumptions on the parameter distributions. Specifically, you can create a likelihood profile which will provide an exact change in likelihood for 'every' value of parameter around the optimum. Consequently, if the behavior of the the objective likelihood function with respect to a parameter of interest is not well approximated by a conventional t-distribution (& the Hessian matrix), the likelihood profile gives a more exact approximation for any chosen CI. Thus, there is less reliance on asymptotic distributions in this sort of inference. However I thin that reporting a single critical/confidence interval always requires an arbitrary cut-off selection. The only difference is what distribution the critical values come from (for likelihood profile it will be chi squared instead of t). If this comments is useful, i'd be happy to elaborate. I'm also working on a somewhat related paper.
The likelihood function is nothing but the marriage of the data and a probability model. It allows us to profile out nouisanse parameters and to express the interesting parameters as a function of predictors from which we want to know the coefficients. The maximum likelihood estimates (MLEs) of these coefficients are no "good values", that is, they are no "better" than the sample. They are used essentially just to normalize the likelihood we get for any other vector of coefficients. One dan derive the probability distribution of the normalized likelihood values, what is used to formulate likelihood-ratio tests about coefficients. All these t-tests, F-tests, and Chi²-tests can be shown to be nothing else but likelihood ratio tests. There is no principal difference in using a proxy like t- F- or Chi² values or the likelihood ratio directly (as by the AIC, which is a difference of log likelihoods and thus a log likelihood ratio).
Considering the likelihood to test coefficients is only of theoretical interest as it shows how all the tests follow from a consistent concept, no matter what the probability model for the data is or what predictor function is used.
The likelihood function as a whole becomes relevant practically when it is used to integrate the information from the data (that is relevant to the model) into an existing probability model about the coefficients - that is: in a Bayesian analysis. There it is really about estimation of the coefficients. The MLEs are only asymptotic estimators, in a way that the MLE and the Bayesian expectation (as well as the posterior mode) converge as the sample size goes to infinity.
The problem with p values is they give a veneer of simplicity to statistical testing. It is a simple target to aim for and if you reach it, it is easy to justify. They do have very strong value in interpreting science but they are not enough for full understanding precisely because they only look at one side of the equation. Researchers should be made to go beyond simply claiming significance.
Personally I think the state of the p-value isn't a reflection on its usefulness, but rather the human need to for simplifying complex issues and passing quick judgements. This has led to it being used in ways it was never meant to be. Likelihood based approaches are at risk from the same erosion.
One useful value of likelihood based approaches has been that it encourages more thoughtful analysis of results. If de-facto standards are being accepted for likelihood based approaches then surely it will run the risk of equivalent problems to the p value (cut-off hacking I guess being one) as you fear. Whether a likelihood ratio is enough to be practically useful is context dependent - in a classification test for example, the cost and benefits of capturing or missing correct positive and the same for the negatives are all different. The minimum likelihood required for a test to be useful will depend on the interbalance of these costs and benefits and the population prevalence of positives and negatives. Basically compared to a p-value it shifts the point of determining practical significance from looking at effect size over to looking at setting an acceptance threshold.
There is still the same tug of war between scientific meaning and practical utility no matter what stats are used to analyse a situation. And there will always be debate around dichotomising any variable - ultimately any framework chosen will be traced back to biases and assumptions no matter how fancy the maths used to set the decision line.
a reference you might find useful (advocates for bayesian hypothesis testing but does not try to hide the very many pitfalls, even when restricting the discussion to the t-test ...)
The Likelihood estimations represents the asymptotic estimators, this mean that when the sample size approaches infinity, the maximum likelihood estimations will be convergent.
Interestingly, the MLEs are those values of the prior on the parameters (coefficients) at which the information w.r.t. to the parameters is minimal, and up to some constant, the log likelihood at the MLE just gives that information value. In a comparison of the restricted model and a full model, the constant cancels out so that the difference in the minimum information values is simply the difference in the log likelihoods at the MLEs (or the log of the likelihood ratio at the MLE). This is thus a measure of how much information in the data would be unused in the restricted model. The aim of the likelihood ratio test is therefore to test how probable this is, given the restricted model. The interesting point is that all this can be understood only when we do see probability as representing a "state of knowledge", what can be about observable values of a response as well as about parameter values, or about both together in a joint distribution (from which Bayes theorem can be derived).
Thus, although Fisher tried his best to avoid any reference on probability distributions over parameters and priors, his likelihood approach is in fact founded on just that. Stating that the Bayesian approach is nonsense thus implies that the likelihood approach is nonsense, too. The Bayesian approach just makes use of some explicit prior in which the information from the data is then incorporated.
I am curious about your response tearing apart my post :)
I would also be thankful for anyone pointing out mistakes or misconceptions I might have.
----
PS: Regarding the the interpretation of the MLEs (I think you agree here). For Fisher (let's say, for a frequentist), the MLEs are not estimating parameter values but estimating a maximum likelihood value. Without back-reference to Bayes (interpreting a probability distribution as representing a state of knowledge that can be about parameter values as well), the actual values of the MLEs have no particular meaning, there is nothing telling us how "good" these values are. The parameters remain unknown, and nothing in frequentist theory would tell us what we "know" or "belief" about their values. This is the same with confidence intervals: nothing tells a frequentist if or who likely a given confidence interval contains the "true" parameter value. The particular interval can be wrong to any extend, that is just unknown. It's only the procedure of the construction of these intervals that has a coverage probability. Using a particular interval to justify one's belief that the true parameter will be somewhere there is the entrance into a Bayesian interpretation of probability (and implicitly assuming a flat prior, what might not be a good choice).