I have a tabular dataset made of floats which follow a simple rule like:

"if features A and B are in a certain range then target class is 1, otherwise target class is 0."

Since I want to get some interpretability from my neural network model, I opted for using the integrated gradients method implemented by alibi.

Unfortunately, most of individual samples don't show A and B as the leading features as expected. Even more weird is the fact that, when I average the attributions of all the individual samples, A and B get the highest score. In other words, local explanations fail but, on average, the global explanation is correct.

Can anyone help me out to understand why this happens? Isn't integrated gradients method suitable for tabular datasets?

By the way, my baseline is based on a uniform distribution of random floats ranging from 0 to the maximum of each column. Should it be fine or, perhaps, is there a better choice for tabular data?

More Bio-Hpc Hpc's questions See All
Similar questions and discussions