I am using 7-point itemize scale, with an additional option of "Not Applicable (N/A)" in my questionnaire. (1)Should I consider it as "missing value"? (2) If yes, should it be MCAR or MNAR? (3) Should I use pairwise deletion?
Your description of your setup would seem to suggest that the N/A answers should be counted as missing values and that they be recorded as MCAR. It does not seem that the N/A values would be correlated with their (specific) question/selection numbers or with any other previous or subsequent question/selection numbers.
I think Eric hit the point precisely. N/A values arise because a particular item is not relevant to a subject. The relevance/irrelevance is usually random. In this case, although the N/A response may be related to the specific question asked or to some other question, it is unrelated to the response itself. Think about it this way: if a question is supposed to be answered by married people, every unmarried person will have an N/A repsonse irrespective of what his/her response would have been anyway. I find the materials at http://www.hqlo.com/content/2/1/29 and http://www3.nd.edu/~rwilliam/stats2/l12.pdf quite instructive.
In some cases, NA implies that the subject does not belong to your sampling frame: s/he should simply not have been invited to the survey - which you could not know up front - and may be deleted from the data, to my opinion. In other cases, however, NA has a content-part to it. NA may mean "I don't want to fill this out". Alternatively, in a study with, say, marriage satisfaction (to follow up on Abiodun), the question of whether those who got married in a random sample are a random selection of the sample is indeed relevant: satisfaction with a relationship may well predict marriage (and subsequently the satisfaction with the relationship when married). A simply solution -and others will no doubt have more complex extensions- is to (1) set all NAs to zero on your 1-7 scale, (2) create a dummy for NA, and (3) include in your regression the dummy and the interaction term of the variable and the dummy (which is actually identical to the original variable). You can then separately assess the effect of NA and the variation in available answers, while retaining the full sample. Heckman selection models are an obvious next step when the NA-issue is suggested to relate to sampling selection. I hope this helps!
I'm having the same problem but with quantitative data. See, my compounds may either bind or not bind to a receptor...if they do, they get a "binding energy" value ranging from negative to positive (i.e. the more negative, the better binding). If they don't, then I assign N/A to them. The problem is, I don't know if it's acceptable to assign zero to N/A since in the first place, these compounds would not have any binding energy value at all if they did not bind to the receptor.
I wanted to be statistically-sound but I don't have an in-depth knowledge on Statistics.
Hi all. No answer from me, but the exposition of a similar problem.
My aim is to find a predictive Machine Learning model for (say) death or survival of patients of a certain pathology, based on a group of variables (features).
Some patients underwent a particular surgical intervention, others did not (boolean variable A, no missing data), and only for those who did (A = true), another variable B related to the intervention is relevant/applicable.
So this is not a case of "missing because the subject did not reply" or "I lost the measure". I cannot just put a NaN to B for those who have A = false, otherwise some mechanism of data imputation will take place and nonsense ("average") B values will be created to fill the holes.
How to correctly proceed?
May trivially assigning a fixed value for B, far different from the typical ones, help?
I will read through the suggested material and search other, and I will be back if I find any answer.