Welch's test for feature selection???

More Dhanunjaya Mitta's questions See All

Are linewidth of the laser source and vacuum fluctuations measured using the homodyne detection independent?

For Vacuum fluctuations based QRNG source, how will the linewidth of the laser source affects the shot noise. If we decrease the linewidth of the laser source, will we get a better output.

17 July 2023 739 1 View

How to simulate Vacuum Fluctuation based Quantum Random Number Generator?

Vacuum Fluctuation based QRNG uses homodyne detection to measure the shot noise (or vacuum noise) which is quantum phenomena. With these fluctuations one can design a RNG. I want to simulate...

17 July 2023 1,945 0 View

In DPS QKD, how to determine that interference happening at the DLI is actually because of two neighboring photon or a photon and a vacuum state ?

Suppose we are attenuating up to 0.2 mean photon number. So, every RF cycle will not contain a photon. So while providing a 1-bit delay, it is possible that the interference can happen between a...

13 December 2022 8,329 1 View

Does anyone know any open source for the code of LDPC codes?

I want an LDPC code for generating the LDPC parity-check matrix with the following inputs: n= Total number of bits m= number of parity bits k=number of information bits. The code should generate...

14 November 2022 3,102 1 View

Can voltage flutuations in agarose gel cause shearing of DNA samples in the gel?

Any one please explain this.

05 November 2022 6,569 0 View

Can we use grid search to know the k features in filter based feature selection?

Hello all, I have a doubt on Feature Selection. Actually, i am working on asthma patients dataset and I need to select the features that are actually making difference between the asthma patients...

12 March 2018 5,718 6 View

Threshold value in filter based feature selection?

Hello Everyone, I am using filters for feature selection and i have a doubt that how can we select the top k features after giving rank to the features individually??? or How can we know the...

27 February 2018 9,955 3 View

Which Feature Selection is well suited for very high number of features???

Hello All, I would like to know which feature selection algorithm is very well suited for the dataset which has more than 15000 feature variables. Also some one suggest me to use Bonferroni's...

12 February 2018 5,370 8 View

Image comparison for video surveilliance?

I am trying to write an algorithm for surveillance camera where it sends only changed frames(any deviations to base frame) to control centre than just simply always sending live stream. I am...

17 March 2015 9,215 5 View

Why activated CAR-Jurkat cell could not kill targets?

Previously when I co-coluture anti-CD19(FMC63) CAR-Jurkat with Raji with E:T=5:1, Jurkat can eliminate Raji in 24h. However, when I test another CAR construct, although I can dectect totally CD69...

06 August 2024 641 2 View

Which test should be used to study association among demographic profile and awarness level?

i have to study the awareness and adoption level of cloud computing in a district of India. i also want to use association among demographic variables like gender, age, education, income etc and...

02 August 2024 2,420 3 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

How to do Mann-Whitney U test with Bonferroni corrected p-values?

Dear All, My lab primarily works on insect wing patterns. In one of the projects, my student and I have defined 19 abnormality characters on the forewing and 6 abnormality characters on the...

31 July 2024 6,464 5 View

Bonferroni correction. I have independent t-test, paired t-test and ancova conducted. Which test would require Bonferroni adjustment?

I have two groups that I test on three different tasks. I have 4 independent t-test, 6 paired t-test and 2 ANCOVA. My concern for which t-test should I conduct bonferroni correction. At the moment...

28 July 2024 7,827 6 View

Can I use Likert scale with Paired Sample T-test?

Hey researchers! I am currently doing a research about to what extent in which A has accelerated the inclusiveness of the payment system in my country. Down below are a few examples of the...

26 July 2024 5,654 3 View

Paired t-test or unpaired t-test for my quantitative data with SPSS?

I am conducting a qualitative-driven approach to mixed-method research. The role of my quantitative data is to corroborate the findings of the qualitative data. Qualitative data has been collected...

24 July 2024 9,799 3 View

How to transfect circular mRNA to T cells?

Hello! We are using LNP to deliver circular mRNA into T cells. However, compared with linear mRNA, the transfection efficiency and protein expression duration are similar. Are LNP generation or...

21 July 2024 4,322 1 View

How to perform immunofluorescence on suspension cells?

Some time ago, I wanted to perform immunofluorescence on the S protein of mouse CD4 T cells using a Leica super-resolution microscope. I fixed with 4% paraformaldehyde, then added it to the well...

20 July 2024 3,797 2 View

Chi-square test for allele distribution?

Hello, when calculating the p value for the alleles in the table, how do we place the values in the chi-square test in the four-eyed table? Thank you very much for your attention.

17 July 2024 998 2 View

Samer Sarsam

Hi Dhanunjaya,

AFAIK, Welch's test can be used for feature selection process; a large t-statistic value (in conjunction with a small p-value) would provide sufficient evidence that the distribution of values for each of the examined classes are distinct and the variable may have enough discriminative power to be included in the classification model.

HTH.

Samer Sarsam, PhD.

Shuichi Shinmura

I cannot understand your questions.

Fisher iris data is well-known data for discrimination.

So, you explain your question to use this data more detail.

Oyebayo Olaniran

Hello Dhanunjanya,

Welch test has it's place in feature selection and it falls under the filter method of feature or variable selection. The feature mean for a class is tested against the feature means of the other class. Because the assumption of homogeneity is expected to be violated then the need for robust method such as Welch. A feature is relevant for predicting the response class if the p-value returned by the Welch test is less than a threshold. Now the multiple testing problem comes, the threshold say 0.05 cannot be maintained as its now equivalent to 0.05×p, where p is the number of features. The correction methods is to family wise error correction or false discovery correction. The standard is to set the family error rate at 0.05 or false discovery rate at 0.1. Thus, the significant relevant features are those with p-values less than fwer or fdr.

I could not get the background of the question and I understood the meaning by former two answers. I will withdraw my previous answer. First, variable selection method in the discriminant analysis is very easy to use regression analysis. We set objective values as 1/-1.I think you do not know this fact. In this case, the F test is used. There is no advantage of using t-test or Welch test. From 1970, many medical and statistical researchers tried to discriminate cancer and normal patients. We assume 100 patients with 10,000 genes. However, the statistical discriminant function is not helpful at all by medical researchers. Perhaps, they judged that statistical methods were totally useless. And there is also a foolish study to think that a gene with a large value is cancer gene by t-test. On the other hand, since statistic researchers can use high-quality data, there are many studies as Big data, but the results are not clear. For discriminant analysis, the misclassification number (NM) should be used as a research first, but NM has many drawbacks. So I developed a linear discriminant function (LDF) based on the minimum NM (MNM) criterion. I analyzed microarrays used in the paper published in Science etc. from 1999 to 2004 and solved it easily in 2015 in only 54 days. First of all, microarrays are all linearly separable data (LSD). That is, the two groups are completely divided into the high dimensional gene space. No one pointed out this important fact. Next, the LSD has a Matryoshka structure containing a linearly separable subspace in it. Among them, those with a small number of genes are called SM.

Then, microarrays can easily be decomposed into approximately 100 pairs of SMs and noise subspaces. In other words, "Big data analysis" can be broken down into problems of the small sample size of about 100 pairs. It seems that the previous two answers knew that there was research using the Welch test for variable selection. I think such research is totally meaningless. My results are detailed in my Springer book "New Theory of Discriminant Theory after R.Fisher (2016)" and Amazon "From cancer gene analysis to cancer gene diagnosis " (2017). Do not waste your valuable research time for nonsense research thema.

In statistical learning, every task depends on goal set before hand which implies if the goal is to reduce the dimension of dataset in a binary classification task the Welch test is one of the many alternatives. In fact the popular NCBI gene expression dataset repository uses Welch test for two class dataset and ANOVA for multiclass. Dhanunjanya can easily confirm from ncbi.nlm.gov./geo .The top 250 genes command makes use of Welch t test for gene ranking.

Thank you for Olaniran.

Your answer is very fresh for me. However, I have found several pairs of SMs that can completely separate cancer and normal patients or different types of cancers with nearly 40 genes. If the number of genes is 49 or less, Japanese diagnostic center can diagnose at 100,000 yen or less　by blood. I think that the inspection of 250 genes may be expensive and useless. Please tell me your thoughts.

I forgot the important thing. I performed t-test of genes included in all SM. Then it distributes, for example, from -10 to almost 0, and around 10. In the Welch test, I think that it is wrong to make the possibility of cancer high for those with positive values. Please tell me what you think of negative and nearly 0 things.