Can I leave some dummies out of a bacward selection so that they appear in the final model?

More Johanna Mello's questions See All

What factors could be contributing on low tumor take in female Nude mice?

Hello everyone, I'm working with a PA-1 ovarian cancer cell line xenograft model in female nude mice. My cells are at 372 passages, the cell repository bank themselves gave at 352th passage....

05 July 2024 5,860 1 View

Is there a study on the LCA of Kapton/Polyimide available?

As Kapton is one of the predominantly used substrates in printed electronics applications I was wondering if there is a study on the LCA of Kapton/Polyimide available? At least considering the raw...

21 May 2024 7,402 0 View

Why can't the lactate dehydrogenase test be run in the majority of clinical chemistry tests in the use of lithium heparinized plasma tubes?

According to the study "The transition from gel separatory serum tubes to lithium heparin gel tubes in the clinical laboratory", most clinical chemistry and immunoassay tests can be performed...

09 April 2024 6,395 1 View

What are strategic challenges for insurance companies against the backdrop of digitalization?

What are the biggest strategic challenges that insurance companies are facing as a result of digitalisation?

19 November 2023 2,728 5 View

How to achieve highest aeration in small (10ml) culture volumes?

Hello everybody, For my research I have to optimize the growth in liquid cultures of a bacterium for which oxygen availability is very important. Long story short, there will be other factors to...

22 June 2023 167 2 View

IHC wash details?

Hi, I was wondering if it's possible to decrease non-specific background in IHC staining by using 37C washes instead of room temperature washes. Thanks.

19 June 2023 3,538 1 View

I am running a mediation analysis using PROCESS macro for SPSS. How should I interpret a higher estimate for the direct than the indirect effect?

In my model the indirect effect is significant and the direct effect is not significant. However, the estimate of the direct effect is much higher than the one of the indirect path. Should I also...

05 June 2023 387 11 View

Why is Tris-HCl much more popular than PBS/HEPES for cell lysis buffers?

Hello! I am preparing a protocol for cell lysis of tissue samples. I want native conformation of as many proteins as possible and would therefore like to have good pH control. In most protocols,...

24 May 2023 3,399 6 View

Optimal buffer for total native protein extract from tissue?

Hello! I am working on a protocol to extract as many native proteins as possible from brain tissue. The end goal is to use the supernatant of the lysate in immunoprecipitation against some...

08 May 2023 7,943 1 View

How do you analyze the cell cycle with flowjo? from a .c6 file?

When i try to analyze the cell cycle in flow jo from a bd accuri c6 file i get an error. I followed the instructions in the tutorial in youtube. I tried to increase the number of events but the...

27 April 2023 1,906 2 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

Talking therapies for bipolar, psychology?

what is the best research evidence for psychological interventions for Bipolar?

01 August 2024 6,023 2 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

What are some diseases that are caused by overactivity of enzymes?

I want to choose a resarch topic regarding enzyme inhibition. So I did my research and found out most of the diseases that originate from enzymes were actually caused by the "deficiency" of...

30 July 2024 2,483 5 View

Recruitment for Postpartum Mental Health Research?

We are currently recruiting for two studies on postpartum mental health as part of our work at the Perinatal Mental Health Research Lab at Alliant International University. If you or someone you...

30 July 2024 2,950 0 View

James R Knaub

Johanna -

I would not suggest backward or forward selection at all. The best set of predictors is not likely to be one of the combinations which you would find that way.

It is important not only not to use too few nor too many predictors, but to use the right ones. They act together, generally with more collinearity and other problematic relationships than would be ideal.

There are other methods which you might find researching "model selection," but using your subject matter expertise could help you decide on some alternative models which make sense. However you arrive at two or more alternative models, you could compare fits using the same sample on the same scatterplot, using a "graphical residual analysis." I often suggest that, and to avoid overfitting to a particular sample such that your selected model might be a much worse fit to other parts of the population or subpopulation to which you want to apply it, you could research "cross-validation." If you have enough data for two or more different samples, you might accomplish all of this by comparing the model results for each sample on a separate scatterplot for each sample.

Note that a graphical residual analysis used to study fit includes considering heteroscedasticity, which should be modeled also. Heteroscedasticity is expected when predictions are of different sizes, which is the general case. The best set of predictors may not be good enough to mimic the behavior of the y-variable, however, which is discussed in the following:

https://www.researchgate.net/publication/352134279_When_Would_Heteroscedasticity_in_Regression_Occur.

Note that OLS regression is just a special case of weighted least squares (WLS) regression, where weights are equal, but this may be far from realistic. (When you have autocorrelation too, then you need GLS regression.)

Best wishes - Jim

If you insist on a backward selection, though such sequential selection procedures are not recommended, I would guess that if you left dummies out in the beginning, thus grouping some data that you would not have wanted to be grouped, that that could be a (further) problem with selecting predictors this way. Maybe you'd want to start with them included, and never exclude them?

Johanna Mello

Dear James, thank you so much for your useful answer! We have to use bacward selection and indeed, I wish to include these dummies at the start and I do not want them to be excluded from the regression.

I don't know whether this is possible.

Thank you!

I don't know why you'd want to use that selection method, but if so and your software somehow wants to treat dummy variables as if they weren't dummies, then maybe it would accept an if-then kind of command such as 'if dummy selected for removal, then skip.' - Best wishes.

Kelvyn Jones

I too would be very wary if you are developing a model for understanding, but you question is a software issue - so Minitab for example gives you this facility

https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/regression/how-to/fit-regression-model/perform-the-analysis/perform-stepwise-regression/#potential-terms where

Displays the set of terms that the procedure will assess. Indicators (E or I) next to the term in the list signify how the procedure handles the term. The Method you choose determines the initial settings in this list. You can modify how the procedure handles the terms with the two buttons below. If you don't use these buttons, the procedure can add or remove the term from the model based on its p-value.

E = Include term in every model: Select a term and click this button to force the term into every model regardless of its p-value. Click the button again to remove this condition.
I = Include term in the initial model: Select a term and click this button to include the term in the initial model. The procedure can remove these terms if its p-value is too high. Click the button again to remove this condition.

Thank you James! I will think about other possibilities than backward models, maybe not use OLS at all.

Best wishes.

Daniel Wright

One possibility is combining backwards (or forwards) selection with constraining the sum of the absolute values of the beta values. This is called the lasso (book downloadable from https://web.stanford.edu/~hastie/StatLearnSparsity/) and there are variations of it. But, this still has some of the issues as the traditional backwards/forwards selection, so if you have a substantive theory to guide you that is better.

Also, it is worth you saying WHY you need to do backwards selection. Is this a class exercise?

For the original question, sure you can include them. and depending what software you are using you can write a function for that.

David Eugene Booth

Some literature suggestions for not using the step methods and what might be useful instead are given in the attached paper. Best wishes, David Booth