What are the consequences of having a lot of regressors in comparison to the number of observations?

More Timea De Wispelaere's questions See All

RNA later for the preservation of RNA in fecal samples at room temperature for one day (37°C)?

I am planning to collect human fecal samples for metatranscriptomic analysis using MGI. These samples are from indigenous people living in a region with high temperatures. I will have access to a...

06 August 2024 1,367 3 View

How to develop an academic literacy program for engineering at the higher education level?

Information literacy in higher education integration with curricula engineering

04 August 2024 5,368 3 View

How can i generate a CRISPR knockin mutation zebrafish model with a reporter?

Hey! I aim to generate a transgenic knockin zebrafish line that mimetizes a genetic condtition that leads to a certain disease on human. To do so, I need to insert a codon for mutagenic aminoacid...

14 July 2024 6,240 0 View

What should be the best Lumens range for T8 (120cm) full spectrum LED lamp tubes?

Please (for Arabidopsis), what could be a good Lumens and color range (Kelvin) range for full spectrum LED lamp tubes size T8 (120cm) for each shelve measuring 130x50 cm (length x width) and 60 cm...

11 July 2024 6,078 1 View

Cross Attention in Transformers: Standard applications of the same ?

What are the standard applications of Cross Attention in Transformer Architectures ?

09 July 2024 9,310 2 View

Time Series Analysis: Has Recurrent Neural Networks (RNN) ever been used on Time Series Analysis ?

Are there standard RNN architectures been applied for Time Series Analysis, forecasting and anomaly detection problems ?

30 June 2024 3,169 8 View

LSTM on Time Series: Has LSTM architectures ever been applied to Time-Series Forecasting ?

Have we ever used LSTM architectures on Time-Series Forecasting and Analysis, and gotten a decent result ?

30 June 2024 6,924 3 View

What could be causing these smears in my PCR electrophoresis gel?

I am new to running PCR gels. I loaded this gel and I thought it was fine, meaning I saw/felt no apparent punctures or spillovers to neighboring wells (see picture 1). When the gel started to run,...

30 June 2024 4,107 4 View

What are the typical applications of Large Vision Models (LVMs) ?

Where are large vision models typically used ?

25 June 2024 4,113 0 View

Are there standard libraries/frameworks for doing RLHF for training LLMs ?

When it comes to Re-inforcement Learning with Human Feedback, are there standard libraries/frameworks for training LLMs ?

25 June 2024 1,121 0 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

AUX gas reading problem on QE with full MS and PRM method in one run?

Dear QE-users, In the method where full MS positive mode and PRM mode are used, we always get an incorrect auxiliary gas reading (41 instead of 25). This only happens in this method; other...

06 August 2024 4,953 0 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

People weight in Oaxaca Blinder Decomposition on R?

Hello guys! Do you have experience running a Oaxaca-Blinder decomposition on R applying person weights. How do you suggest doing it? I have a variable PERWT which gives more information on how...

04 August 2024 6,033 0 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

Talking therapies for bipolar, psychology?

what is the best research evidence for psychological interventions for Bipolar?

01 August 2024 6,023 2 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

David Eugene Booth

Sure does. drop say 10 observations and OLS probably won't even run. If this is a predictive model do adaptive lasso variable selection (with all the cat. variables, you need adaptive group lasso). If you really need all those IVs you have to get a bigger n. Good luck. D. Booth See the attached two papers. They might give you some ideas.

Jos Feys

There are numerous rules-of-thumb :

https://www.researchgate.net/post/what_are_the_consequences_of_having_a_lot_of_regressors_in_comparison_to_the_number_of_observations?isAnswerFieldFocused=true

https://www.researchgate.net/post/How_much_sample_size_required_for_applying_linear_regression_model_for_each_predictor

Guy Mélard

Are these categorical dummies related to one categorical variable or to several categorical variables ? In the former case, grouping (based on knowledge, not on the results) can be considered. In the latter case, suppose you have m categorical variables. You can try by taking a subset of m -- 1 of them (that means omitting one), i.e. m regressions, and see if the results are stable. Maybe you can omit one of them. I remember having seen good results with 48 observations and 20 variables in Vatter et al. (1978).

Timea De Wispelaere

Thank you for your answers! Guy Mélard : I have a categorical variable of the first two digits of the NACE code of the industry of the firm, which results in 26 dummies, and one categorical variable Province that results in 10 dummies. Because they are related to a few categorical variables, I thought this would not be problematic. So what you suggest is that I regroup my industry dummies n for example divisions , and my province dummy in for example regions?

No. Of course you can try to regroup several NACE sectors and/or several provinces. For example for provinces you can consider North, West, South and East, ans similarly for the sectors. But my second suggestions was to use only NACE sectors, on the one hand, and only Provinces on the other hand, and see if the results are stable. Anyway, in principle you should also consider interactions (i.e. all products of the NACE and Province dummy variables) but then it is hopeless because you will have 26*10 = 260 dummies in all. In my count I did not treat the case of the constant.

David Eugene Booth I am struggling to determine when the number of my regressors are to big, what is the influence on my regression results? What is striking is the fact that the F value points to joint insignificance. Is this a consequence of the small sample size? and does this mean I cannot interpret the results? However, I cannot increase n since I am doing research on the whole sample

Maurice Ekpenyong

If your model contains two predictors and the interaction term, you’ll need 30-45 observations. However, if the effect size is small or there is high multicollinearity, you may need more observations per term. Compare this with the 48 regressors in your study.

Timea De Wispelaere a couple of points: first the rules of thumb are data dependent so sometimes 5/IV is ok sometimes 10 is ok there is no one single value that is magic. More is better. If your are forming a explanatory model that's about all you can say except the usual power sample size calcs can be done. do one for each term and then take a number greater than max(ni. ) For predictive models we can do a little better. I currently like lasso for various reasons. 1 adaptive lasso is has an oracle property(i.e. the model gives you the best predictor set from among the candidate variables by using cross validation with a max information criterion like BIC or AIC. That's good for predictive models. Further lasso models run for n

Hello David. Thank you for your valuable answer. However, the main cause of the high number of regressors in my model is control variables industry (26 variables) and province (10 variables). Should I also use this Lasso approach, knowing that the majority of my regressors are there for the purpose to control?