Is there a max number of explanatory variables you can include in a GLMM based on the N of study?

More Emma Katherine Wallace's questions See All

Are there any papers looking at how often people change channels when watching tv?

I'm looking for a paper that I can cite that has looked at how constant channel changing is irritating and may lead to others in the room choosing to leave. I would like this to be able to compare...

04 May 2016 967 3 View

At which point does the distribution of data become too skewed for a GLMM to run?

I have collected data on responses to a questionnaire but have found ceiling effects for three questions. I wonder at which point it becomes pointless running a GLMM. Is there a hard and fast rule...

03 April 2016 7,125 4 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

How do I access .vcf files without an R statistical package?

I am currently working on a mendelian randomization study, and I have downloaded the datasets needed from the ieu opengwas project (mrcieu.ac.uk) in .vcf format. I do not have access to an R...

19 July 2024 2,342 5 View

How to decide whether the refinement is correct or not, based on Rwp and Rexp factors by Fullprof?

One of the papers I read by Toby, where (title of the paper was "R factors in Rietveld analysis: How good is good enough?"), he tells us that to get good chi square value, you must have low Rwp,...

17 July 2024 9,668 4 View

Why does our stiff biochemical ODE model in R produce unreasonable results (negative values, NAM) despite using solvers like lsoda, vode, and rk4)?

We have developed an ODE model comprising 25 interrelated equations with common coefficients. This biochemical model, applied in wastewater treatment, is characterized by stiffness. Utilizing the...

06 July 2024 7,077 4 View

How to reconstruct original observations using PCA?

I ran PCA on 4 variables using the prcomp library. All variables were normalized to have a mean of zero and a standard deviation of one (z-score) before the PCA. prc 1 and I performed a varimax...

26 June 2024 6,792 1 View

What is it's difference between lsoda method in R vs. ODE23 or 45 solver in MATLAB?

What is it's difference between lsoda method in R vs. ODE23 or 45 solver in MATLAB.(especially in wastewater treatment and biochemical processes) I am currently engaged in the development of a...

24 June 2024 9,188 2 View

What guidelines, suggestions, and advice do you have for building a successful research and development (R&D) career?

Hello everyone, I am currently 26 years old (feeling like I am making a lot of mistakes at this young age), finishing my Master's degree, and planning to continue onto a PhD and possibly a...

30 May 2024 6,980 2 View

Should large parameters in the Rietveld Refinement be reduced despite a good fit?

Chi-squared is a small, good-fit chart but it has high R factors. Is this result acceptable?

25 May 2024 6,830 0 View

Hello everyone, I am looking for collaboration to do clinical research?

I have many skills like data analysis with R programming or Spss , scientific writing and paper publishing

23 May 2024 1,683 3 View

How to install a shiny app?

For example, I want to install the app "01_hello" as an independent app (like many other apps of windows). Please find the attached screen. But after playing the app, I run into error.

22 May 2024 955 2 View

Fabrice Clerot

the sky is the limit ...

however, great care should be taken when reaching the "p>>n" regime (much more explanatory variables than cases) because of the quasi-certainty of overfitting ; sparsity-inducing regularization highly recommended :

http://statweb.stanford.edu/~tibs/ftp/lasso-retro.pdf

ftp://ftp.stat.math.ethz.ch/Research-Reports/Other-Manuscripts/buhlmann/glmmLasso.pdf

R package :

http://cran.r-project.org/web/packages/glmmLasso/glmmLasso.pdf

Emma Katherine Wallace

That's fantastic, thank you. I'll go through those links in detail shortly. We have some research questions with N 150 and about 8 IVs so it sounds like that might not be too much of a problem.

Daniel Wright

In some areas of study it makes sense to have very large number of predictors, like in some modern biostatistics. It is good that p>>n methods are available. But, it depends what you want to do with the results and often the number of predictor variables is more usefully constrained by your theories and the use of them than the number of subjects and statistical methods. So if you put all 8 predictors in (and let's assume no interactions, but ...), then will it be useful interpreting your coefficients as conditioned on 7 others. That might be tricky.

If all you are doing is trying to predict values, then using all of them is more understandable.

Pelumi Oguntunde

@Katherine: I think 8 IVs are not too much for N=150. Nevertheless, after fitting the model, test if the IVs are significantly different from zero or not.

Tricky what significance means for an individual coefficient after using something like the lasso. Even if just using all eight and not shrinking te model, make sure to interpret them in light of there being 8 tests in the family and that each coefficient is conditional on the rest.

Also, before this thread grows too much, we should ask if interactions among the predictors will be included, and check thatjust linear relations are included. The answers to these may complicate the model, but depending on the area of science may be critical.

The questions are relating to a questionnaire study. We are trying to see if 8 demographic factors (age, education level etc) are affecting certain responses, such as if they get a factual question right or wrong. We will look for interaction effects as well.

Even if you just restrict yourself to 2 and 3 way interactions (8 choose 2 plus 8 choose 3 is 84 so more effects to estimate), this means that there are a lot of predictors (even assuming just linear) and interpretation will be complex. So be careful.

On the factual questions, do you have a set of right/wrong questions, so are you using something from item response theory (IRT) to analyze (or analyse, I lived in the UK for 21 years) them?

Timothy A Ebert

Possibly I did not interpret your question appropriately, but it will not work to try and fit Y=X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+e with only ten observations. I think that any model where the degrees of freedom from each variable plus error term equals or exceeds the number of replicates will fail. Interaction terms also use degrees of freedom, so as Daniel pointed out, a few variables can generate a large number of interactions. In Daniel's example, if you use all of the interactions you will have a total of 255 variables in your model. A sample size of 150 will not support a model with all of these. Also, missing cases effectively reduce N. Sometimes missing values are imputed.

Timothy, just to clarify. Fabrice's point is that if you use forward selection (or a bunch of other techniques which are now popular), you can find a solution to predict Y from K variables where K>>N, but of course not entering them all in with a standard regression. Emma's situation is more like the typical traditional problems, but still with interactions and say if you wanted to allow splines rather than straight lines the number of dfs in the model goes may up, so that there are computational problems.

The point I was trying to make in my first comment is that there can be interpretational problems too. Suppose you have a million cases and 8 predictors, and everything solves in a straightforward manner (but the predictors are correlated). It is tricky to interpret what each coefficient means because it is conditional on the other 8. This is a problem that I often have trying to explain results to others. Phrases like "holding 7 things constant but allowing this one to vary" just don't seem to help if you can't do that in reality.

Daniel,

Thank you for the clarification.