Hello,
I am working on a regression problem where I have to estimate the percentage of acceptance of an offer. Although my dependent variable is categorical one. ( I tired to fit the logistic but it didn't turn out very well; despite being 17% of 1s in the whole data set and sample size being more than 100k. My model failed to classify the 1s. Among 4.5k 1s in validation data it classified 20 odds as 1s.
So I took a different way around it and tried to make appear my dependent variable as percentage (rather than 0 or 1) by clubbing 10 rows together. I took the central tendency measure for numerical predictor variables (n = 100 ).
I know that there is a loss of information with this approach. But this manipulated data makes easier to see what my aim of the analysis is.
I am asking this question because I didn't come across anything like this till now? If anyone has worked with similar approach, please give your valuable inputs.
Thanks and Regards,
Irshad