How should I handle zero values when log-transforming variables like obesity and disease prevalence in panel data?

17 April 2025 1 4K Report

Hi everyone,

I'm working with a panel dataset in Stata that includes variables such as type 2 diabetes cases, smoker density, and obesity prevalence across different regions and time periods. Some of these variables contain zero values, which represent actual observations (i.e., no reported cases) in certain areas.

As part of our model testing, we tried using log-level and log-log functional forms, but applying the natural logarithm to these variables resulted in missing values (.) due to ln(0) being undefined. This caused several issues during regression and especially with our Hausman test, where the note said:

“The rank of the differenced variance matrix (0) does not equal the number of coefficients being tested (3)..."

And also our r-squared is very low, only: 40-50. To address this, we are considering transforming our variables using ln(x + 1) instead. I understand this is a common workaround in many contexts, but I would like to ask:

Is ln(x + 1) an acceptable transformation in this case, particularly for disease prevalence and behavioral variables like smoking, where zero indicates no incidence?

Are there any published studies or datasets that use this method, especially in Stata or in health economics or epidemiology research?

Will this approach help preserve the integrity of the sample when running tests like the Hausman test or fixed/random effects models?

Any references, insights, or recommendations would be greatly appreciated!

Thank you in advance.

Robert Boer

Why do you want to apply a log transformation to variables such as type 2 diabetes cases, smoker density, and obesity prevalence?

Badges
Science topic

Similar topics
Medicine
Public Health

More Mika Ed's questions See All

@uganda Ssd Chemical Solution @+27685029687 For Cleaning Notes?

call +27685029687 Ssd chemical solution for sale in Ethiopia, Egypt, DR Congo, Tanzania, South Africa Contact us through phone or WhatsApp Ssd chemical solution for sale in Ethiopia, Egypt, DR...

12 May 2024 3,895 0 View

Can you integrate multiple air quality map images (from satellite data) into a single image that depicts the average of all the images?

So I am conducting a research on changes in NO2 and aerosol index during a certain time period of 1 year. I am using sentinel-5 data. Following is the link: s5phub.copernicus.eu I used...

30 April 2024 8,202 2 View

Why is the surveillance function of the press regarded as fundamental to journalistic practice?

why is press surveillance important to journalism

18 March 2024 3,566 0 View

How to test the blocking effectiveness of bovine albumin in lateral flow tests ?

I would like to develop a quality control method, which I can use to test the blocking efficiency of the new bovine albumin batch in lateral flow test. Do you have any ideas?

15 October 2023 2,781 0 View

Does the independent sample T-test take into account standard deviation?

There are 2 groups: one has an average of 305.15 and standard deviation of 241.83 while the second group has an average of 198.1 and a standard deviation of 98.1. Given the large standard...

06 August 2023 2,414 8 View

[Meta-analysis] What course of action is feasible for the imputation of data (MD +/- SD) from Median and Interquartile range?

We are conducting a Network Meta-Analysis of Randomized Clinical Trials and as with most (N)MA, the problem of pooling outcome measures is rooted on the varied reporting outcome estimates of...

06 December 2022 9,900 1 View

On Gödel's undecidability?

It is common to affirm that "One can never perform any measurement whose result is an irrational number." This is equivalent to say the contrapositive, that anything that can be measured or...

17 November 2022 2,367 17 View

Theoretical farm work?

I'm working on a cross sectional study ,if I selected to use published survey about medication adherence in my thesis (already validated and has high reliability),but this study did not follow any...

23 April 2022 5,214 0 View

Is there any value beyond i which number requires?

I am interested in the incompleteness of boolean logic. Logic consistently constructed from 1 and 0 is always incomplete with respect to infinity but it is also incomplete with respect to i. If I...

16 December 2021 7,938 39 View

Is the incompatibility of ZFC and Boolean logic an active area of study within mathematics?

Boolean and ZFC logic are constructed from fundamentally different foundations. Modern (20th century) mathematics appears to be dominated by the ZFC approach. I am not sure it is understood the...

22 August 2021 1,922 6 View

• What the possible Persistent Organic Pollutants and Heavy metals present in fluorspar, sediments, and water bodies around its mining area?

Approximate concentrations are require in compared with the WHO permissible limts

11 August 2024 2,723 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View