I have a metric that produces a percentage of the total number of registered users over the total number of all users. Question is that for a third of cases in my dataset, the total users are zero and so the metric results in zero. Is this accurate? The options that I came up with are as follows:

(1) 0 describes the condition where there are no registered users, but in the case of no users at all, 0 can be deceptive.

(2) Alternatively I could define these cases as NA. The problem is that I cannot omit these cases since I am using other variables in the dataset and NAs consist of 1/3 of the dataset. So I am forced to use in this case a fix such the na.roughfix in R which replaces all NA with the median. But, this is still deceptive because I truly know that they were no users, so replacing the percentage with the median will produce false results.

(3) A final option that I see is to use an extreme and probably negative number such as -1. I have no idea what could be the implications for this however in terms of running correlation and linear regression models.

What do you think about this? There are trade-offs with each case but I want an option where it is going to be "universally" accepted.

More Michael Tsikerdekis's questions See All
Similar questions and discussions