I used an unbalanced sample when estimating logistic regression. The reviewer suggests that I should use a balanced, sample though it will reduce the number of observations. Any specific suggestion or reference that can give me any insight?
No, only if you are aiming to analyze the temporal effects with panel data, in which case the balancing allows you to control the reliability of the parameters.
You can also use the database unbalanced data and then use the balanced seat as a robust test, not to suffer from data loss.
Logistic regression requires dependent variable which is in binary form i.e., 0 and 1. A balanced sample means if you have thirty 0, you also need thirty 1. But, there is no such condition in logistic regression. You can use dissimilar 0 and 1 sample because you don't know that how many of your sample receive 1 score and how many receive 0 score. We usually use balanced sample in pattern recognition analysis where we take similar number of 0 and 1 companies. But in logistic regression, we use any number of sample.
I also have the same problems with the logistic regression. The unbalanced condition of some variables regarding binary dependent variable is too large. More than 27 thousand obs for 0 and only 166 for 1. After I run the model using stata, the variable was omitted. I expected the reason behind omitted variable is because too large unbalanced. Do I use OLS regression rather logistic one?
I think you should read this article: http://www.win-vector.com/blog/2015/02/does-balancing-classes-improve-classifier-performance/
It describes actual experiments conducted to reveal effects of imbalanced data on several popular classifiers, including LR, SVM, RF, etc. I think you will find what you are looking for here.
If not, you will find the same conclusions here: http://www.analyticbridge.datasciencecentral.com/forum/topics/handling-imbalanced-data-when-building-regression-models
along with few references which all confirm that balancing the data in the case of logistic regression does not yield significant improvements of the prediction accuracy, if any.