Statistical test for Discriminatory Power of Shapley (XAI) Values?

12 June 2022 0 4K Report

I've seen posts discussing clustering on labelled datasets. Below is an approach I'm considering. IN short, I would be clustering "base data" vs. "XAI-calculated" data to see how the XAI-calculated data discriminates better. Do the steps below make sense? My specific questions are listed after the steps below, but in short:

The key question I have behind all this is, besides whether the overall flow below makes sense, is the very last question at the bottom.

This last question asks about what appropriate statistical test to use to test my hypothesis that XAI (Shapley values) discriminate data better than base data alone. The metrics I would use are summary metrics based on clustering. Would I still be able to perform a statistical test if the summary metrics are not observation-specific but instead aggregate metrics?

BACKGROUND OF PROCESS:

Note: raw dataset is labelled, i.e., includes a binary target outcome variable

Apply predictive machine learning model, e.g., random forest, XGBoost, or some other ensemble algorithm

Apply PCA (principal component analysis) - on model or raw data? - to determine top features for clustering later

Calculate an XAI (explainable AI) metric on predictive model

Cluster (k-means) the XAI metrics based on the top two features identified by a Shapley summary plot - is k-means clustering OK even though the raw dataset is labelled?

Cluster the raw data values from the PCA step #2 above

Compare the clustering results between #4, #5 through clustering metrics (completeness, homogeneity, etc)

Apply a statistical inference test like t-test to see whether differences are significant between XAI-generated results vs. base data results - this part is a bit foggy to me, not sure if it can be done here?

My questions and problem related to the above are the following:

When I apply PCA, my understanding is, it is on the raw data, not predicted output?
I assume it is OK to apply (unsupervised) k-means cluster to a labelled dataset if I am clustering based on the top two features identified by Shapley values (XAI metric) vs. clustering base data using PCA values on the top two features?
To test my hypothesis as to whether Shapley values are more effective at discriminating data than base data alone, I was going to perform a t-test (or non-parametric equivalent) on the clustering metrics based on Shapley values vs. clustering metrics using base data. The base data would be without any model or XAI applied ... Does this approach make sense? The clustering metrics would be summary-level, not for each observation.

Badges
Science topic

Similar topics
Filing

More Sue Hl's questions See All

Can you reuse a primer that has been thawed and place back in the freezer for lab 3 months later??

I work at Camosun college. I am the technologist that makes up PCR mix- Master mix, water, PV 92 forward and PV 92 reverse primers for a PV92 lab. In some cases, I may have extra amount of...

22 November 2023 7,531 5 View

Can anyone help me to obtain a downloadable copy of the following paper?

J.C. Gasparetto, T.M.G. Francisco, R. Pontarolo, The impact of acetonitrile on human health: Clinical and Toxicological Overview, International Journal of Child Health and Human Development....

26 August 2023 3,865 2 View

Why My affiliation in the ResearchGate was wrongly described?

My affiliation in the ResearchGate was wrongly described. (Now it shows the Cheonnam National University) My affiliation is the Seoul National University and I have never works for Cheonnam...

23 August 2023 6,003 1 View

Post hoc of two-way ANOVA with two levels?

I am investigating whether two independent factors with only two levels each affect my dependent variable memory performance. Thus, I performed a two way analysis where my IV1 was significant and...

09 July 2023 1,955 7 View

What is the key point of preparation of MoO3 single-walled nanotubes?

Recently, I used molybdic acid/ammonium molybdate, dodecyl mercaptan and water to prepare MoO3 single-walled nanotubes according to the method reported in the literature, but none of them...

02 July 2023 1,672 2 View

"Hip Adductor Muscle Strength in Patients With Varus Deformed Knee" Why is this publication incorrectly titled?

The title of this publication is totally incorrect. It will never be correctly cited with this title as the area of research is totally wrong. The title indicates a clinical study and this is a...

08 June 2023 6,361 0 View

Can probiotics supplement help to prevent American Foulbrood in honey bees? How does probiotics and prebiotics help boost honey bee health?

I am exploring how probiotics and prebiotics can help honey bees in their fight against diseases and pest. Is it possible to use probiotics and prebiotics supplements, and not chemical...

18 November 2020 7,112 1 View

How to pack empty column with zinc powder?

How do you pack empty column with zinc powder? This is for the Vitamin K1 analysis using HPLC.

26 October 2020 4,161 3 View

What is the suitable temperature to store Vit A, E, D sample extract extracted from milk powder?

I run the sample preparation using AOAC Vit A,E,D method, that require saponification, concentration , drying using nitrogen gas, reconstitute with methanol. Since the HPLC method development is...

23 October 2020 6,619 4 View

When are you going to El Salvador next?

Hi Ralph, Just wanted to tell you that, in 2004, I gave my entire research archive--notes, interviews, documents, photos--to the UCA library. You may find it helpful.... Best, Tommie Sue

10 January 2019 7,968 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View