Hi everyone,
I'm working with a panel dataset in Stata that includes variables such as type 2 diabetes cases, smoker density, and obesity prevalence across different regions and time periods. Some of these variables contain zero values, which represent actual observations (i.e., no reported cases) in certain areas.
As part of our model testing, we tried using log-level and log-log functional forms, but applying the natural logarithm to these variables resulted in missing values (.) due to ln(0) being undefined. This caused several issues during regression and especially with our Hausman test, where the note said:
“The rank of the differenced variance matrix (0) does not equal the number of coefficients being tested (3)..."
And also our r-squared is very low, only: 40-50. To address this, we are considering transforming our variables using ln(x + 1) instead. I understand this is a common workaround in many contexts, but I would like to ask:
Any references, insights, or recommendations would be greatly appreciated!
Thank you in advance.