What is the most appropriate regression model for a response variable that has non-binary values between 0 and 1? Example of a paper shall be appreciated.
R can handle 37 variables, that's no problem. But you have only 100 rows, what means you have only about 3 values per variable what is really really little. Your model will vastly overfit the data. To get a useful model you should consider irgnoring (most) of your variables or to collect much more data.
I further have the impression that many of your predictors are categorical variables that seem to be coded by integers (like 0/1 for a dichotomous variable). For 0/1 coded variables this is no problem, but is a variable has more than two levels you should tell R that it is a categorical variable, e.g. with
The "normal" linear regression model as suggested by Rami would only be ok when the values are sufficiently far away from 0 and from 1, so that there are no severe "bounding-effects" in the data (what influences the sekweness and the variance of the data).
Actually appropriate would be a beta-regression model. Google for it and you will find some sources to read.
This is free(!), open-source, professional, flexible and extensible environment for all kinds of statistical analyses. The functionality to fit beta regression models is offered in an extension package called "betareg":
https://cran.r-project.org/web/packages/betareg/
You may need to invest (quite a bit of) time and effort to get used to and to learn R (but thats very rewarding!), but you won't need to invest any money.
Jochem, I am extremely grateful for your quick guidance. I have downloaded and installed R in my computer but the paper I am working on is urgent. Can help me to quickly estimate my model on betareg. What is the data format accepted here. In which folder will it be. I some knowledge of programming. Suppose I have y, X1, X2, x3; what is the betareg code? Thanks.
Usually you will have organized your data in a table (in Excel or some other spreadsheet program), whene the variables (y, x1, x2, x3) are in columns and the data are in rows. So if you have, for instance 50 values, you vae a table with 51 rows and 4 columns; the first row contains the column names ("header").
There are many, many ways to get the data into R. Here is just one way:
Save your data table as a simple text-file. The columns can be separated by tabs (or by any other character like commas, semicolons, spaces).
You can the read the data into R with
df = read.delim("path/to/the/textfile.txt")
This expected the tab as the column separator. You can specify a different seperator using the argument 'sep'. You can read the help/documentation for 'read.delim', e.g. by entering
?read.delim
Note that in R the path delimiter in the filename is always the slash (not the backslash like in the Windows OS).
When you read in the data, R knows an object named "df", which is a 'data.frame'. You can print its content to the console by typing
df
or to just to show the first few rows:
head(df)
So you can check that the data was read correctly. You can further type
summary(df)
to get a summery of each column (=variable).
To fit a betaregression model you need to have the betareg package installed. If this is not yet the case you can intall it with
install.packages("betareg")
When it is installed, you need to tell R to "use" it with
library(betareg)
Then you can fit a (simple) model with
model = betareg(y~x1+x2+x3, data=df)
This model acounts only for linear and additive effects of the three predictors. You are essentially free how to specify the model - it should be oriented on the subject matter and make some sense (be reasonable). The formulas the same as for "normal" linear models. You can google for "R formula syntax" to get help.
You can get a summary of the fit and the coefficients with
summary(model)
An ANOVA is not sensible for betareg models, but the package "lmtest" offers a function to perform likelihood-ratio tests on the parameter or on the comparison of nested models (the function is "lrtest", see also the help to betareg: ?betareg).
There is also a predict-method that can be used to predict response values (y) for any given combination of predictor values. It is a good idea to check the reasonability of the fit by plotting the predicted values together with the data...
Thanks Jochem. I tried my best to do it but..... Below is the code I developed but my problem is that the data is not read. I have 37 variables, can R handle that much? The data file is here attached in case it is needed. I have to disturb you, thanks in advance.
R can handle 37 variables, that's no problem. But you have only 100 rows, what means you have only about 3 values per variable what is really really little. Your model will vastly overfit the data. To get a useful model you should consider irgnoring (most) of your variables or to collect much more data.
I further have the impression that many of your predictors are categorical variables that seem to be coded by integers (like 0/1 for a dichotomous variable). For 0/1 coded variables this is no problem, but is a variable has more than two levels you should tell R that it is a categorical variable, e.g. with
if you are using Stata you could try betafit from Marteen Buis (findit betafit). This is the original beta regression package adapted for Stata. You can use several Stata commands for analysis with it. Another option is the use of fractional logit (ssc fraclogit).
You our can take a look at the paper or Cribari-Nieto and the Stata journal for beta fit for beta regression. There is some help available for fractional logit in the Stata help http://www.stata.com/statalist/archive/2012-10/msg00427.html
The use of robust linear regression is not the most conservative way of perform the analyses but under some conditions thatndo not violate the assumptions of the regression is an alternative to pursue.
@Jochem: Thank you for that piece of information. I was thinking of Logistic regression, Poisson regression, e.t.c. Just getting to know more about Beta-regression for the first time. Thank you for your efforts.