From a large sample (RCT) I got a binary response X (yes/no) and would like to see if there is a difference using NIH grants which is also a binary response (yes/no)
GLM means generalized linear models, which you can use for a variaty of outcomes, not only continuous. Given your data, you can thus either use logistic regression or - as you did - GLM with option family=binomial. Both should give you the same, correct results.
What you are doing seems correct. Maybe just a notational thing: "X" in your model is a predcitor, not a "response" (as you wrote). I presume you wanted to say that both variables, X and NIH, are binomial. In your model, the binomial variable NIH is the response and the binomial variable X is the predictor.
253266 degrees of freedom indicates that you have a huge data set. If this is so, then looking at p-values makes little sense. You should interpret the estimate instead. The estimate here is about 1.43 (you can get a confidence interval using confint(fit.1way) - this will presumably be quite narrow, given that sample size). The estimate is a log odds ratio; the odds ratio is exp(1.43) = 4.12, saying that the odds of getting an NIH grant are roughly 4 times higher when X is TRUE as when X is FALSE. Depending on what X is, this may or may not be relevant. However, keep in mind that getting a grant is one thing, and the hight of the grant is another. It could be that X is TRUE for smaller grants, which are more likely and more often to be given, because there is not that much money involved. Further, think carefully if there is a confounder that could also explain the higher odds of getting a grant (e.g. X is correlated with the NIH butgets for different resesarch topics). And lastly, be careful not to confuse correlation with causation when interpreting your results.
Formally called a log-linear model just in case you'd like to read about the family. Clearly you have a tremendous amount of data so are very likely to get a statistically significant result.
Looks correct. But, do not get too excited about the p-value when you have a large sample size. Most likely, it is the effect size you are really interested in and this can be inferred from the estimates.
Patrice Showers Corneli: this is not a log-linear model, as logistic regression uses the logit link function.
A log-linear model is simply a logistic regression with categorical explanatory variables and can use a log or logit link. A logistic model has categorical response and also accommodates categorical and also continuous explanatory and may use a logit or probit.
To the main point, the question is whether the statistically different between the two groups is in a practical sense a meaningful difference.
Martin is certainly right. A very small p-value will be obtained from very large studies. But a small p-value is only important if the difference between the groups is large enough to provide insight. A very small difference can always be detected with lots of observations. But whether the difference is meaningful can only be judged by the researcher who understands the size of the effect that is informative about the process being investigated.
A drug tested against a control in a very large study with thousands of participants is expected to be statistically significant. But if it reduces the probability of dying over the control group by, say 3%, it is not really biologically important.
We use statistics to inform our scientific practices. So knowing the magnitude of effect that we would consider important requires scientific knowledge of the study material.
Patrice Showers Corneli : That simply is not correct. A logistic regression is a logit-linear model. The logit function serves to map a response variable that has a lower and upper boundary to "linear space" (which has no limits). In other words: counts with a maximum number. The very name "logistic" comes from the fact that the logistic function is the inverse of the logit.
A log-linear model is a Generalized Linear Model with a logarithmic link function, typically this is Poisson or Negative binomial regression. It applies when there is no upper limit to counts. The type of predictors is generally irrelevant for all linear models. See chapter 7.2.1 of my online book: https://www.researchgate.net/project/Book-New-statistics-for-the-design-researcher
That being said:
log-linear models can approximate logit-linear models fairly well, when the upper bound is very large. (but, why would you want an approximation, when doing the real thing is the same effort)
Patrice Showers Corneli explanations on p-values versus effect sizes are spot on
when the purpose of modelling is prediction (rather than hypothesis testing), the AIC is the appropriate model score (rather than the p-value).