How can adjusted R2 and Predicted R2 be calculated for genetic programming metamodels?

More Kanak Kalita's questions See All

How to check leakage current of Si-SiO2 wafer?

I want to use the n-type Si wafer coated with a 300 nm SiO2 layer for the OFET device, but for that, I have to check the leakage current of the dielectric layer. how?

07 February 2024 4,115 2 View

What will be the quality of Al2O3 layer in OFET device fabricated through RF Sputtering?

This question is related to OFET devices. Please suggest some answers to this question.

07 February 2024 8,067 0 View

How to extract velocity data of a moving object from a simulation done in Ansys Fluent?

Hello Everyone!! I am currently working on a model for renewable energy harvesting using ocean wave energy in Ansys fluent. I have applied a dynamic mesh and compiled the UDF for motion of a...

30 January 2024 8,896 1 View

How to remove negative volume error in Ansys fluent dynamic mesh?

Hello Everyone!! I am currently working on a model for renewable energy harvesting using ocean wave energy in Ansys fluent. I have a design ready along with a UDF for a moving component. But when...

08 October 2023 5,531 5 View

How can we isolate exopolysaccharide content from cyanobacterial strains?

Is there any method for isolating exopolysaccharides (not the total Carbohydrate) from cyanobacteria?

06 August 2023 4,234 2 View

How can I find or estimate the species specific parameters α and β for application in Beta-poisson model for QMRA?

I am trying to estimate the QMRA for three species of bacteria using beta-poisson model. The literatures that I followed have not reported the α and β values for these 3 specific bacteria, while α...

27 May 2023 5,692 0 View

Q-Exactive calibration error for Mass resolution Dependency too high, can anyone help?

Hi, After performing my annual cleaning on the QE and calibrated, it as passes the positive cal but when performing the basic (negative) Analyzer accuracy it has failed for Mass resolution...

08 January 2023 2,206 1 View

What is the simplest method for hematocrit correction during glucose measurement in electrochemical dry strip technology?

Hematocrit correction during glucose measurement is very important for enhancing the accuracy of measurement in whole blood. As far my knowledge, it done electrochemically either through impedance...

20 December 2022 8,534 0 View

How to distinguish triplet-triplet annihilation band from phosphorescence and TADF?

If an organic molecular emitter shows multiple photoluminescence bands, what tools and techniques can one use to confirm the triplet-triplet annihilation (TTA) band?

18 November 2022 6,879 4 View

Can anyone tell me how to publish RFC standard?

I want to publish a work as an enhanced version of RFC 8180.

11 November 2022 6,235 1 View

All math can be explained by iterator of code?

all math can be traversed by code? all math can be translate to code?

26 July 2024 9,530 0 View

How can microbial engineering techniques be used to enhance the functionality and efficacy of probiotics ?

through genetic modification or metabolic engineering? Is it helpful, good for health ?

22 July 2024 7,811 4 View

Which book and outline do you recommend for computational physics course for BS level ?

students already took 1. numerical methods 2. programming language 3. Probability and statistics

09 July 2024 6,271 3 View

Why does our stiff biochemical ODE model in R produce unreasonable results (negative values, NAM) despite using solvers like lsoda, vode, and rk4)?

We have developed an ODE model comprising 25 interrelated equations with common coefficients. This biochemical model, applied in wastewater treatment, is characterized by stiffness. Utilizing the...

06 July 2024 7,077 4 View

Is there any way to calculate whole genome similarity of organism without the whole genome sequencing ?

is there any ways that i can find genome similarity of an organism without whole genome sequencing ,like using maths formulas or experimental progress ?

05 July 2024 4,070 3 View

Which is better for the student : Implementing the principles of object-oriented programming using Java or C++?

Object-Oriented Programming

29 June 2024 4,877 12 View

How to design an online training, learning platform ?

when designing an e-learning platform what model and programming language do you select?

29 June 2024 7,504 4 View

How to reconstruct original observations using PCA?

I ran PCA on 4 variables using the prcomp library. All variables were normalized to have a mean of zero and a standard deviation of one (z-score) before the PCA. prc 1 and I performed a varimax...

26 June 2024 6,792 1 View

Is it okay to grow bacteria in 30 for cloning? What is the lowest mass of Vecor I can take for cloning? what should be the optimum insert ratio?

I am using a vector that has repeat sequences, and using Mach1 competent cells. I heard that if the cells are not stable cells, then we shouldn't grow them in 37 to avoid recombination to happen...

26 June 2024 3,735 3 View

What is it's difference between lsoda method in R vs. ODE23 or 45 solver in MATLAB?

What is it's difference between lsoda method in R vs. ODE23 or 45 solver in MATLAB.(especially in wastewater treatment and biochemical processes) I am currently engaged in the development of a...

24 June 2024 9,188 2 View

Marco Virgolin

Arguably, in GP the independent regressors are the initial variables (i.e., the columns of your regression dataset), which are supplied as terminal nodes. It sounds odd to me to call a tree branch an independent regressor, as it is a transformation of the initial variables.

Note that, in general, your initial variables may not be independent at all. You may want to do some data preprocessing to select the initial variables before trying to generate a regression model.

I am not aware of adjusted R squared computation in Genetic Programming. The easiest approach to me is to use the formula provided in, e.g., https://www.quora.com/What-is-the-difference-between-R-squared-and-Adjusted-R-squared , considering K = number of different initial variables of the GP solution tree. You should think if this makes sense, and which information it provides you.

The predicted R squared is calculated as the normal one, but on data that has not been used by the training phase. In other words, you split your dataset into two, training and test data, use GP on the training data to find a good model (tree), compute the predicted R squared of such model on the test data.

You want the model evolved in the training phase to commit little error on the training, without being overly complicated (i.e., small number of nodes and low K), as simpler models typically generalize better. Look at multi-objective GP for regression.

Kanak Kalita

Thank you Marco for your answer!

Your argument for the initial variables being the independent regressors makes sense rather than using the 'no. of branches'. But I think taking the no. of variables in the final GP model should be considered. Suppose a GP model is to be built for 20 variables but say only 13 of them show up in the fittest GP structure. In that case don't you feel no. of independent regressors should be 13 and NOT 20. Of course as you suggest "some data preprocessing to select the initial variables before trying to generate a regression model" (and some variable sensitivity analysis) is perhaps a good way to ensure (but not necessarily guaranteed -- GP being a black box) that all initially selected variables make it to the fittest GP structure.

Sorry for not explaining that clearly enough. That is exactly what I meant:

"considering K = number of different initial variables of the GP solution tree" (that is, 13 in your example).

What do you mean by "GP being a black box"? The final solution, e.g., a tree, can be inspected and read. This is called a white-box model, as you can understand what is going on (unless the number of nodes is so big that it becomes hardly understandable what the tree means). You can easily count the number of unique independent variables.

Also, why would you want to ensure that all the initial variables end up in the final solution? The fact that they are independent does not mean that they are meaningful for the variable to regress.

E.g., let's say you want to regress people height. You have four variable variables: age, race, country of origin, favorite color. Probably, race and country of origin are strongly interdependent, and you may want to only use one of the two (let's say race). Age, race and favorite color are reasonably independent, but favorite color is not a meaningful variable for height. So, ideally, your final model should only use age and race. If it uses favorite color, it is likely overfitting to some noise in your training data, and you will see a lower score in the predicted R squared when testing the model on the test data.