What is the most appropriate regression model for a response variable that has non-binary values between 0 and 1?

Jochen Wilhelm Popular answer

datatable1 = read.delim("datatable1.txt", sep=",")

R can handle 37 variables, that's no problem. But you have only 100 rows, what means you have only about 3 values per variable what is really really little. Your model will vastly overfit the data. To get a useful model you should consider irgnoring (most) of your variables or to collect much more data.

I further have the impression that many of your predictors are categorical variables that seem to be coded by integers (like 0/1 for a dichotomous variable). For 0/1 coded variables this is no problem, but is a variable has more than two levels you should tell R that it is a categorical variable, e.g. with

datatable1$X3 = factor(datatable1$X3).

Jochen Wilhelm

The "normal" linear regression model as suggested by Rami would only be ok when the values are sufficiently far away from 0 and from 1, so that there are no severe "bounding-effects" in the data (what influences the sekweness and the variance of the data).

Actually appropriate would be a beta-regression model. Google for it and you will find some sources to read.

Job Nda Nmadu

Thanks Jochem for your very useful response. But I perused Stata and found that it is not available. Which of the software has it please?

Jochen Wilhelm

Just for example: R

https://www.r-project.org/

This is free(!), open-source, professional, flexible and extensible environment for all kinds of statistical analyses. The functionality to fit beta regression models is offered in an extension package called "betareg":

https://cran.r-project.org/web/packages/betareg/

You may need to invest (quite a bit of) time and effort to get used to and to learn R (but thats very rewarding!), but you won't need to invest any money.

Job Nda Nmadu

Thanks.

Job Nda Nmadu

Jochem, I am extremely grateful for your quick guidance. I have downloaded and installed R in my computer but the paper I am working on is urgent. Can help me to quickly estimate my model on betareg. What is the data format accepted here. In which folder will it be. I some knowledge of programming. Suppose I have y, X1, X2, x3; what is the betareg code? Thanks.

Jochen Wilhelm

Usually you will have organized your data in a table (in Excel or some other spreadsheet program), whene the variables (y, x1, x2, x3) are in columns and the data are in rows. So if you have, for instance 50 values, you vae a table with 51 rows and 4 columns; the first row contains the column names ("header").

There are many, many ways to get the data into R. Here is just one way:

Save your data table as a simple text-file. The columns can be separated by tabs (or by any other character like commas, semicolons, spaces).

You can the read the data into R with

df = read.delim("path/to/the/textfile.txt")

This expected the tab as the column separator. You can specify a different seperator using the argument 'sep'. You can read the help/documentation for 'read.delim', e.g. by entering

?read.delim

Note that in R the path delimiter in the filename is always the slash (not the backslash like in the Windows OS).

When you read in the data, R knows an object named "df", which is a 'data.frame'. You can print its content to the console by typing

or to just to show the first few rows:

head(df)

So you can check that the data was read correctly. You can further type

summary(df)

to get a summery of each column (=variable).

To fit a betaregression model you need to have the betareg package installed. If this is not yet the case you can intall it with

install.packages("betareg")

When it is installed, you need to tell R to "use" it with

library(betareg)

Then you can fit a (simple) model with

model = betareg(y~x1+x2+x3, data=df)

This model acounts only for linear and additive effects of the three predictors. You are essentially free how to specify the model - it should be oriented on the subject matter and make some sense (be reasonable). The formulas the same as for "normal" linear models. You can google for "R formula syntax" to get help.

You can get a summary of the fit and the coefficients with

summary(model)

An ANOVA is not sensible for betareg models, but the package "lmtest" offers a function to perform likelihood-ratio tests on the parameter or on the comparison of nested models (the function is "lrtest", see also the help to betareg: ?betareg).

There is also a predict-method that can be used to predict response values (y) for any given combination of predictor values. It is a good idea to check the reasonability of the fit by plotting the predicted values together with the data...

Job Nda Nmadu

Thanks Jochem. I tried my best to do it but..... Below is the code I developed but my problem is that the data is not read. I have 37 variables, can R handle that much? The data file is here attached in case it is needed. I have to disturb you, thanks in advance.

data("datatable1", header = TRUE, sep = ",", row.names = 1, package = "betareg")

## regression with phi as full model parameter

gy1

Jochen Wilhelm

datatable1 = read.delim("datatable1.txt", sep=",")

datatable1$X3 = factor(datatable1$X3).

Job Nda Nmadu

Great. It worked Jochem. Thank you very much.

Job Nda Nmadu

Jochem, please let us communicate on email [email protected]

Lucas A Salas

Dear Job,

if you are using Stata you could try betafit from Marteen Buis (findit betafit). This is the original beta regression package adapted for Stata. You can use several Stata commands for analysis with it. Another option is the use of fractional logit (ssc fraclogit).

You our can take a look at the paper or Cribari-Nieto and the Stata journal for beta fit for beta regression. There is some help available for fractional logit in the Stata help http://www.stata.com/statalist/archive/2012-10/msg00427.html

The use of robust linear regression is not the most conservative way of perform the analyses but under some conditions thatndo not violate the assumptions of the regression is an alternative to pursue.

Good of luck with your analyses.

How can I prepare virus for a TEM or SEM imaging?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?