Consider data that is taken from a one-parametric distribution (e.g. Poisson) that can be parametrized by its mean (µ). The task is to get the 95% confidence interval (CI) or the estimate of µ.
According to Wilks theorem, the CI is the set of all values for µ for which the log-likelihood (L) is above a threshold L(µ) > L(mu_hat) - c, where c is half of the 0.95-quantile of a chi² distribution with 1 d.f. (what is about 1.92).
If the distribution contains nuisance parameters (e.g. negBinom or Gamma), I thought that they have to be "profiled out" and that the profile-L is used to get the CI. This is how it is described in Yudi Pawitan's book "In all likelihood" (see chapters 4.6 and 4.7 for examples).
I noticed that, at least for the negBin distribution, R calculates that CI from the (log-)likelihood with the nuisance parameter kept fixed at its max.likelihood estimate:
exp( confint( MASS.glm.nb(counts~1) ) )
gives the limits that are obtained according to Wilks, but using the max.likelihood for "size" parameter of the negBinom distribution, and not the profile likelihood. The CI using the profile-L would be wider.
Why that? Which procedure is "better" or "more correct"? Using the profile (like Pawitan writes) or using the likelihood keeping the nuisance parameter fixed at its estimated value?
I hope the question is clear... thanks for any help!