Why is there a constant difference between the predicted value and the actual value in the regression problem?

More Teng Zhang's questions See All

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Are there any good simple systems or platforms to recommend?

In order to show people the beauty of control and enhance enthusiasm for learning control theories, are there any good simple systems or platforms to recommend?

05 August 2024 10,034 1 View

Where to find a gene list for CRISPRa/i library screening of regulatory factors that affect pathogenic Th17 differentiation in PBMC?

I want to perform a CRISPRa/i library screening for candidate regulatory factors that affect pathogenic Th17 differentiation in PBMC. But I am wondering how to define the size of gene library, so...

31 July 2024 2,150 0 View

The complex interactions between oral microbiota and oral mucosal immunoregulation: implications for oral health and disease development?

This research question explores the complex interactions between oral microbes and oral mucosal immune regulation. The diversity and function of oral microbes have important impacts on oral health...

09 July 2024 7,883 0 View

How to properly dilute RNA for cDNA?

Hi all, I have a question regarding proper dilutions and conversions of my RNA. According to my protocol, my cDNA kit is optimized for 1ug of RNA and calls for this amount. I would like to...

07 July 2024 9,206 4 View

Can I use BSA for a Histological Blocking Solution?

Hi all, I would like to know if it is possible to use Bovine Serum Albumin (BSA) in my blocking solution for immunofluorescence muscle fiber typing. The primary antibodies' host is mouse. The...

07 July 2024 7,967 3 View

Use magnetic bead purification after gel electrophoresis?

Hi, I am working with Invitrogen Seamless cloning which requires 200 ng/ul insert DNA fragment. Protocol recommends to use restriction enzyme to digest pre-cloned plasmid and then elute specific...

25 June 2024 971 2 View

How to solve the problem that the film obtained by magnetron sputtering on the ternary alloy target A2BC is ABC (not A2BC)?

Using magnetron sputtering with an alloy target composed of A:B:C in a 2:1:1 ratio, the resulting thin film composition is A:B:C=1:1:1. How can you achieve a A:B:C=2:1:1 composition? What are some...

24 June 2024 1,605 0 View

A small question about the principle of lasers…?

Given the energy level diagram of a ruby laser shown in the right figure, where \( R_{02} \) is the pumping rate, and \( \tau_2 \) and \( \tau_1 \) are the lifetimes of levels \( n_2 \) and \( n_1...

23 June 2024 4,855 0 View

JCR 2024 Impact Factor Download File-Factors caused decline in IF are?

Why the IF of almost all journals declined so much? For the file download go to https://www.researchgate.net/post/JCR_Impact_Factor_2024-Factors_causes_decline_in_IF

21 June 2024 3,358 0 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Jamie Wallis

Without knowing which ML algorithm are you using its impossible to say. It is likely a model problem if its a constant difference for all data points.

Ioannis Kouroudis

If you are using some flavour of RNN, this is because it overfits and simply outputs the value of the previous timestep (and it looks constant because it is hard to distinguish point k and k+1 in a timeseries of thousands of points). In which case it is likely both a model and a data problem (As in, model overfits and data not enough).

Otherwise, if you are trying to predict point estimates, i would also imagine that this is a model problem, but without more details (what are you trying to predict, what architecture, how many data, etc) it is really hard to tell.

Alexander Kolker

A difference between the predicted regression value and the actual value is called residual. One of the main assumptions of the regression analysis is the normal distribution of the residuals with the mean equal to 0, i.e residuals must be both positive and negative. If this condition is not met (residuals are only positive or negative, i.e. the model either consistently over-predict or under-predict) this means that the regression model is poorly chosen for prediction, although it might reasonably fit the data.

The reasons for poor prediction might be that (i) the regression model should include non-linear terms instead of being just linear, (ii) multicollinearity of the independent variables, (iii) missing important predictive variables (features), i.e. a poor understanding of the problem and the data, (iv) non-constant data variance, or (v) incorrect use of the model for extrapolation (prediction) way beyond the range of the independent variables that were used for the model's parameters estimation.

Overall, your focus should be that the model meets assumptions of regression analysis (model problem) rather than on machine learning technology.

James R Knaub

Teng Zhang -

You said that "...there is a constant between the predicted value and the actual value," but you cannot actually mean that the difference is always a known constant. That would mean that you know all dependent variable values with a zero estimated residual. I mean that for that model, the "irreducible error," sigma, is zero. So all predicted-y values would be associated with e=0, so y = predicted-y. Or if you have not accounted for the constant, c, yet, whether always negative or always positive,

y = predicted-y - c.

Perhaps you can clarify what you meant to say by providing an example.

In general,

y = predicted-y + e,

where e is a random variable, often heteroscedastic, but with a random factor.

Perhaps you meant

y = predicted-y + c + e.

Considering that if there were an intercept term, it would be part of predicted-y, then what you would mean here would be to say that there is a model bias. That is, we do not have model-unbiasedness, so the expected sum of the estimated residuals is not 0, but c. That would be a model problem. But I am not certain that that was your question.

Best wishes - Jim

Cheers - Jim

Additional note:

Please change the last paragraph to the following:

Teng Zhang

James R Knaub Alexander Kolker Ioannis Kouroudis Jamie Wallis thx you all，with your help, I have a general understanding of this problem, and I will consider these problems in my practical problems.