What is the ideal treatment of the responses in which the respondents did not answer all the questions/field? Should we abandon these partially filled entries altogether or take the answered fields in our analysis? I am not sure. Kindly help.
Before you drop out any of those survey returns with missing responses, you have to consider the cost you incurred while gathering these returns. Rather than throwing away those returns, try to review the questions where they skipped answering. Did you state it in a manner that is too difficult for the respondents to understand or is the question too sensitive for respondents to answer. This will help you draft questions properly in your next survey. But what to do with your returns with missing responses? I would suggest you still include them but you have to be transparent in your presentation of results by including the percentage of respondents that has skipped answering (some survey's includes don't know the answer/decline to answer to make sure no missing responses in the returns). The good side of this is that you can maximize the margin of error into your final analyses (which would affect the confidence interval of the statistics if sample size is too small). In some of my surveys, I only dropped survey returns when respondents only fill-in their profile but has no answers on the survey questions and those that their answers is incomprehensible (yes, there are respondents who are not serious enough in providing answers and i find it annoying most of the time). you can encounter them if you have open ended questions. Finally, before you dropped any of your partially filled survey returns, think it over, go over it and review/validate their responses.
In your case include only those forms that are complete in your reporting. Forms with incomplete informations must be identified by a unique code . they can never be included into analysis or reporting purpose.
For replorting purspose, in case if questionnaires are completed but havse some missing items and dont know catagory items, u will also have report them in your report.
But For the indepth analysis u will be require exclude also those cases that have complete form information bus have some missing items and dont know catagory items. Because these two response alters results of regression analysis bruetly and prevents representativeness of ur results.
Before you drop out any of those survey returns with missing responses, you have to consider the cost you incurred while gathering these returns. Rather than throwing away those returns, try to review the questions where they skipped answering. Did you state it in a manner that is too difficult for the respondents to understand or is the question too sensitive for respondents to answer. This will help you draft questions properly in your next survey. But what to do with your returns with missing responses? I would suggest you still include them but you have to be transparent in your presentation of results by including the percentage of respondents that has skipped answering (some survey's includes don't know the answer/decline to answer to make sure no missing responses in the returns). The good side of this is that you can maximize the margin of error into your final analyses (which would affect the confidence interval of the statistics if sample size is too small). In some of my surveys, I only dropped survey returns when respondents only fill-in their profile but has no answers on the survey questions and those that their answers is incomprehensible (yes, there are respondents who are not serious enough in providing answers and i find it annoying most of the time). you can encounter them if you have open ended questions. Finally, before you dropped any of your partially filled survey returns, think it over, go over it and review/validate their responses.
"Item nonresponse" is an area that has been studied extensively, and you can research that term. - For such inference for a finite population, the study of "response propensity" groups is often recommended. You could research that on the internet. - Also, one useful feature of regression is that you can estimate the "variance of the prediction error," and you will find that in econometrics books. It is very important to have estimations of accuracy. Nonsampling error, such as measurement error, will generally be present, and may become worse if you try to collect too much data. Knowledge of measurement error is quite problematic.
Depends on whether you mean missing answers to survey questions (item nonresponse) or missing sampled units (unit nonresponse). Weighting is usually used to adjust for unit nonresponse (i.e., when no one from the sample unit has responded at all), and imputation is usually used for item missing data. As Paul states, both are dependent on models, but leaving the data unadjusted if you have reason to believe their are not representative (and maybe even if you don't have an explanation off the top of your head) doesn't seem like a good idea to me. As long as you're transparent about your assumptions and estimation procedures so someone can replicate your work, you'll be doing good practice. I'm sure people debate adjusting v. not adjusting data all the time, but in my field adjustment is virtually a given, and the debate is about what type of adjustment to do.
For unit nonresponse, we can know that certain types of people are are less likely to respond to a survey by comparing the demographics of our respondents with population estimates from a more thorough data collection (like decennial Census count for demographic surveys), and then use those counts and control totals in weighting.
It's a little trickier with item nonresponse because you don't have that same external population, but if you think of your unit respondents as a "sub-population" of sorts (or at least a larger group within which you can compare item respondents and nonrespondents), you could look at differences in item nonresponse rates across groups of respondents to the survey. For example, you probably would have nearly complete data on some variables (average item nonresponse rates are 1-2%), and some may be 0%. You could look at item nonresponse rates of your target variable (the one with all the missing data you're worried about) between groups of those variables. Is item nonresponse higher or lower among men v. women, older v. younger respondents, by race, etc. This is a simplified single-variable scenario of course, and you would want to do a similar exploration for each variable about which you're concerned.
I've only used IVEWare from UM (SAS and stand-alone), but I know some people like Joe Shafer's imputaton software for R.
http://www.isr.umich.edu/src/smp/ive/
http://sites.stat.psu.edu/~jls/misoftwa.html
You'll definitely want to read Rubin and Little's work on this.
Generally in survey analysis, we just identifiy the incompleye questionnaires in data sheet with respective code of incompletion. Such incomplete questionnaires creat missing data which creat inconsistancy in survey primary results. These are designated to system missing (excluded from analysis) and if in case of non response on any question or non filling of responce for any question in the questionnaire such responses are calssified as user missing in softwares but are illustrated with separate catagory in primary results in survey reports. Some times these are also merged with 'Don't Know Catagory" for the respective question of the questionnaire to be present in survey results. Its on discretion of analyst/research what he wants according to situation.
Before regression or any tpe of prediction, idealy, system should be free of system missings. more over, usually slopes (not with all models) are estimated after automatically ignoring the missing values (some times user missing as well as system missing or not applicable case). But for exploratory data analysis these are being reporting (non response/user missings/missed options/items of the questions). Main reason for reporting missing values in descriptive/bovariate analysis is that they have their own background and theory. World banks' "power of Survey Design" and Sharon Lohars "sample design (like this title)" well described the backgrounds of missing values.
My personal openion is that they must be reported during exploratory data analysis. But befor going toward multivariatee (regression only) such catagories can be excluded and merged with system missing for facilitation of practical and clear interpretations.
That's a great backdrop to the "modeler v. sampler" issue. I didn't actually do much statistical (at least not much in estimation) while at Census. I worked on usability studies and HCI experiments there, and on some of their operational efficiency efforts related to data collection and interviewer efficiency. But my training (and pre-PhD experience) is in survey methodology. In that field, we're trained with all those big, expensive surveys as models, and that can become restricting view. Also, we're primarily trained from a design-based perspective on weighting and adjustment issues. Those who focus mostly on the statistical science of our field include modellers, but I really focus on the social science side of our field. Imputation is the one statistically thing I've dabbled in, but must of my work on missing data (unit or item) is on predicting it from sample unit dispositions, respondent behavior, interviewer behavior, etc.
By the way, I love your APA books on multivariate analysis.
@James,
Yes, I guess I was hinting at propensity classes. I always do better explaining things in plain language than relying only on jargon, so thanks for making that connection.
Just to share per the discussion, the link below is to the only paper I've ever written on imputation, and it was just an applied test (not a simulation). Did this for JSM 2008 while in grad school. We found that, in these data, the imputation method didn't really make much difference. Obviously, the two big things that would determine whether an imputation helps are a) the missing data rate, and b) the difference between respondents and nonrespondents. If the rate is low, even a big difference between Rs and NRs may not matter. But it can work the other way (Difference between Rs and NRs isn't huge, but the missing data rate is high enough that creates NR bias). This all aside from the practical pain in the butt of having missing data.
The other link below is the Survey Research Methods Section (SRMS) of ASA where you can a lot of conference proceedings on survey nonresponse, survey methodology, imputation, weighting,etc.