I have a panel dataset of 120 different countries measuring a variable over three periods. This variable indicates the percentage of 1000 respondents in each country that answered yes to the question. I am considering this representative of the probability that a respondent in each country will answer yes to that question. The dataset currently has a bimodal distribution across countries, with the results concentrated around 0.0-0.10 and 0.9-1.0. To transform this into a normal distribution, I am using a logit transformation employing the function

log[𝑝/(1−𝑝)]

where p is the probability of a respondent answering yes. However, in three of the countries, 100% of the respondents answered yes, resulting in a logit function that cannot be calculated of

log[1/(1−1)]

What should I do with these three countries in the sample? Is there a legitimate way to lower their values from 1.0 so that they can be used in the formula?

I will also be averaging the panel data over the three periods to create a cross-sectional data set. The three countries of concern have values of 1.0 in a single period, with their results being less than 1.0 in the other two periods, meaning their average probability would be less than 1. Would it be appropriate for me to average the probability values across the periods prior to employing the logit transformation? an example of these two options is formulated below.

Log{{[p1/(1-p1)+p2/(1-p2)+p3/(1-p3)]}/3}

Or

{log[𝑝1/(1−𝑝1)]+log[𝑝2/(1−𝑝2)]+log(𝑝3−(1-𝑝3)}/3

More Zachary Brower's questions See All
Similar questions and discussions