Can someone assist with machine learning algorithms and methods for data that is NOT missing at random?

Fabrice Clerot

see for instance :

https://www.utexas.edu/cola/centers/prc/_files/cs/Missing-Data.pdf

http://sites.stat.psu.edu/~jls/reprints/schafer_graham_2002.pdf

a useful reminder on the (not so self-explanatory ...) terminology for missingness :

Missing Completely at Random (MCAR)
- Missing value (y) neither depends on x nor y
- Example: some survey questions asked of a simple random sample of original sample
Missing at Random (MAR)
- Missing value (y) depends on x, but not y
- Example: Respondents in service occupations less likely to report income
Missing not at Random (NMAR)
- The probability of a missing value depends on the variable that is missing
- Example: Respondents with high income less likely to report income

Matthew G Keffalas

Thank you all for your responses. Looks like I have some reading to do.

I think it will be helpful to write down the actual equations that Fabrice's comments seem to suggest.

Let's assume that Y is a feature, X is the class label (let's restrict ourselves to classification). Essentially, we want to know how the conditional probability P(Y missing | X,Y) behaves,

MCAR:

P(Y missing | X,Y) = P(Y missing)

There is no dependence on X or Y

MAR:

P(Y missing | X,Y) = P(Y missing | X)

No dependence on the feature value Y, but the probability of missing values does depend on the class label X. In other words, different classes will have different percentages of missing values.

MNAR:

P(Y missing | X,Y) = P(Y missing | Y)

Or do we say that P(Y missing | X,Y) cannot be simplified at all (in other words, the missing probability depends on both X and Y).?

Does all of this look correct?

How do we prove which scenario we're in? I imagine we might be able to figure it out based on our knowledge of the data and the underlying experiment/collection that produced the data, but can we prove it through certain measurements? I think you could prove whether or not we are in the MAR situation pretty easily (just measure P(Y missing |X) and see if it depends on X), but the MNAR situation looks to be more difficult.

Masatoshi Sekine

Let me know more precisely the definition of Y missing. I guess it be the missing values of Y, is it correct? In case of the equation

the probability of finding values of Y + the probability of missing values of Y = 1,

you check this relation is too simple or not in your problem. In other words, is there don't care condition related to the class X.

What are the long-term impacts of incarceration on youths' developing brain?

Feedback defines the constitution of an organism?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How to design human-centered classroom in the age of A.I.?

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?