How much dummy variables can you use in a regression without getting the constant term to be huge and significant?

More Rafael Tavares's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

Explain theoretically and with the aid of an example the concept of equation linear and not linear in variables and parameters?

In Econometrics

07 August 2024 6,142 4 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Carol Lee Graham

I don’t think there is a fixed answer and sample size has much to do with it. At a certain point, though, one risks over controlling of course.

S.C Thushara

If you are doing a traditional regression, then the number of predictors needs to be smaller than the sample size, To get precise and meaningful estimates it is often useful to have the number of predictors much smaller (this is what people use techniques like the lasso to search for more sparse solutions). However, the number will depend on more situations so rules of thumb may now apply. (

Daniel Wright)

Paul Louangrath

DUMMY VARIABLE should not cause any problems if they are treated or handled properly. Generally, these variables are qualitative and would take the values of 1 and 0. These 1 and 0 should not be incorporated as values and mixed with other quantitative data in the regression model and should be treated separated under discrete probability. These are generally handled as non-parametric. In parametric analysis, these dummy variables are treated separately under logistic function to answer questions specifically dealing with, say DV and gender or age, etc.

For general predictive function under regression, no many how many dummy variables you used, there should not be any problem because they are not treated as "measurable" quantitative values.

LINKS:

https://en.wikipedia.org/wiki/Dummy_variable_(statistics)

Mariano Méndez-Suárez

If you are modeling time series there is a risk of creating a model only capable of explaining to itself by inflating R2

Kamil Makieła

As a rule of thumb there are three aspects you should worry about:

- total number of regression parameters (including the intercept) should be smaller than number of observations (this was already mentioned)

- collinearity in matrix X (of explanatory variables). If you inflate the number of dummies in X then your X'X may be singular or close to it (to a working precision). This happens when dummies have similar 0-1 patterns over observations. The more dummies there are the bigger the chances are you have this.

- common sense. Theoretically you can have only dummies in your matrix X. But I would be very sceptical about such a model.

John N. Ng'ombe

As others have explained, there is no fixed number of dummy variables, the rule of them is n>k, where n is total number of observations, and k is the total number of parameters being estimated... Hope this helps!

Anton Rainer

1. I do not see why one should get higher and more significant constants, if one uses dummies in a regression.

2. Dummies should never be used only for statistical reasons (to get a better fit and a higher R2 by eliminating data outsiders). They should always be well-justified. In this case, it is no problem to interpret the estimated coefficients.

3. Dummies need not have the values 0 and 1, one can also set -1 and 1 and sometimes even more than two numbers (for example: good, medium, bad quality; very high, high, medium, low, very low and no income). One can, of course, take dummies for each of these categories, but this could lead to unduly low degrees of freedom. By simple one-by-one regressions, one could try to find out what categories are relevant and what (rough) numbers should be used as dummies for the different categories.

4. Of course, it is true that n>k, but n should be much greater than k. I would say n>2k.

John C Frain

Suppose that you have a regression with no dummy variables and an estimate of b for the constant. Now add a dummy variable say 1 for male and 0 for female. Suppose that you now get a constant of b0 and a coefficient of b1 on the dummy variable. This is equivalent to doing a regression for males and females separately but holding slope beta coefficients the same for both male and female. The slope on the male equation is b0+b1 and on the female equation b0. The relative sizes of b b0 and b1 are determined by the data and b can be greater or less than b0 depending on the data set,

Lisa Chea

John N. Ng'ombe does that mean N for observations being the number of cases overall or the number of events?

Does this also apply to logistic regression?