How to measure the relations between features in supervised learning?

More Rami Sihwail's questions See All

How develop anhydrous crystalline polymorph of Pregabalin API?

I am working on Project To develop anhydrous crystalline polymorph of Pregabalin API. Standard theta value : Two –theta values are 9.55 ± 0.2 and 19.13 ± 0.2 need to be match and Melting Point...

03 May 2024 6,898 0 View

Any one suggest updates for treatment of complex regional pain syndrome?

any one suggest updates for treatment of complex regional pain syndrome?

19 March 2024 2,107 5 View

Does specific cell culture media selectively favor the proliferation of 1 cell type?

I am working through the challenges of cell isolation from the rat heart. I am using VSMC media and endothelial cell media. In a homogeneous mixture of cells, would these media favor only the...

06 December 2023 5,007 1 View

What do you think of the quality of this kind of images?

Good Morning; Is it possible to identify specimens from this kind of images and are they acceptable for publications?

17 May 2023 8,708 1 View

Publication not in my portfolio?

How to effectively use and interpret the barium swallow: Current role in esophageal dysphagia NGM 2023 How to effectively use and interpret the barium swallow: Cur... I am senior author on this...

04 May 2023 2,329 0 View

Desmond on WIndows 11?

I am using desmond in windows 11, How could I change from dummy GPU to local one?

29 April 2023 3,156 1 View

Articles about post covid-19 syndrome?

There are several health and physical problems, after recovering from the Corona virus, and I need research links on this topic, thank you very much for all

14 March 2023 3,672 5 View

Can we test a mediator or a moderator in a qualitative study?

Usually, mediators and moderators are tested in quantitative studies. However, can we test them in a qualitative study such as a case study?

14 July 2022 8,932 6 View

What is the difference between trust in information and information quality, as both presume that the information shared is accurate?

Answers from different disciplines are welcome.

11 July 2022 8,363 2 View

What is the detection limit for Dynamic Light Scattering regarding particle concentration?

I am using ~ 400 nm polystyrene nanoparticles, and I would like to use very diluted particle solutions for my experiments (atto-Molar range and lower). After my experimental procedure, I would...

26 January 2022 794 8 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

Abass Olaode

Proximity (or similarity) matrix will allow you to compare any number of feature vector. it will also enable you to calculate mean similarity or a similarity threshold.

Nabil el Malki

you have a good article on this topic in this link: https://towardsdatascience.com/feature-selection-correlation-and-p-value-da8921bfb3cf

Seda Kul

In addition Abass Olaode and Nabil el Malki answers, Visualizing Data with Pairs Plots might help you to see relationship between features.

You can check this article: https://towardsdatascience.com/visualizing-data-with-pair-plots-in-python-f228cf529166

Also, If you want to evaluate the relation between features you need to check Correlation between features. Feature Selection process can make the prediction more accurate.

You can look at this article: https://towardsdatascience.com/why-feature-correlation-matters-a-lot-847e8ba439c4

Rami Sihwail

Thank you dear @Abass Olaode, @Nabil el Malki and @Seda Kul for sharing your knowledge. Basically, I am trying to figure how related features, like series featues in malware, can be useful in feature selection process.

Saúl Solorio-Fernández

Basically, you are trying to find how much redundancy there is in a given feature subset S. In the literature, there are two main approaches widely used to quantify the redundancy in a feature subset S. 1) Quantifying the redundancy of S without considering an objective concept, and 2) Quantifying the redundancy of S considering an objective concept. In the first case, the aim is only to measure the degree of correlation, dependence, similarity, or association (commonly in pairs) among the features in S. While, in the second case, the aim is to quantify the relationship among features in S considering also a specific task or objective concept for which these features could be considered redundant. In your case, given that you are performing a supervised classification task, your objective concept will be the class labels.

The notion of feature redundancy is usually considered in terms of feature correlation; this correlation can be quantified using any measure of similarity, dependency or association among the features, and it is widely accepted that two features are redundant to each other if their values are highly correlated. Therefore, two features f1 and f2 are redundant if corr(f1, f2) > beta, being beta a predefined threshold.

Some useful works on this regard can be found in:

- Yu, L., Liu, H., 2004. Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224.

- Auffarth, B., López, M., Cerquides, J., 2010. Comparison of redundancy and relevance measures for feature selection in tissue classification of CT images. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6171 LNAI, 248–262.

Babak Arjmand

you can use rule tool in the image processing softwares such as ENVI

Ali Al-Yousef

The best way for findong the similarity between two feature is done by building TOM matrix (Topological Overlap Matrix). High value means high similarity. The features with 95% of similarity are most probably redundant.

Yuanrui Fan

According to Harrel's guidelines[1], you can perform the following steps to detect the correlation between two or more features:

1. Correlation Analysis: analyze the correlation between each pair of features using Spearman rank correlation test;

2. Independence Analysis: analyze independent features. For each of binary or nominal features, you can use Chi-squared test of independence to analyze the statistical dependence of the feature from the other features.

3. Redundancy Analysis: analyze features that can be predicted by the combination of other features. You can use the redun function in the rms R package.

[1]: F. E. Harrell. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, 2001.

For more details, you can refer to my papers as follows:

Article The Impact of Mislabeled Changes by SZZ on Just-in-Time Defe...

Article Chaff from the Wheat: Characterizing and Determining Valid Bug Reports

thank you all for your advises and suggestions. My concern is to find the dependency between two or more features rather than the similarity

Khadidja Henni

We have proposed a new unsupervised feature selection method, where the relationships between features are defined according to their ability to discriminate clusters, based on subspace learning concept (the projected clusters properties)

Article Unsupervised graph-based feature selection via subspace and ...

Mehrdad Rostami

We propose a graph-based feature selection method to which can effectively measure and evaluate the features relation (A graph theoretic approach for unsupervised feature selection ).

In the first step of the proposed method the feature set is represented as a weighted graph in which each node in the graph denotes a feature and each edge weight indicates the similarity value between its corresponding features. In the second step, the features are divided into several clusters using a community detection method. The goal of features clustering is to group most correlated features into the same cluster. In the third step a novel algorithm based on node centrality is proposed to select the best representatives of features from each cluster.

A preliminary step for all graph-based methods is to establish a graph over the training data. Thus, we attempt to model the feature selection problem using a graph theoretic representation. In this work, we have used well-known Pearson product-moment correlation coefficient to measure similarity between different features of a given training set.

in the second step, quite different from existing feature clustering algorithms, a community detection method is applied to cluster the features. Detection of communities in the weighted graph is significant for understanding the graph structures and analysis of feature relation.

And finally in the third step, relevant and influential features from each cluster is identified using node centrality.

Tahnks alot @Khadidja Henni and @Mehrdad Rostami for the explanation.

Shankar Lal

Refer handling of feature correlation in following papers:

Conference Paper An efficient approach for network traffic classification.

Article Instance Based Classification for Decision Making in Network Data

Thanks a lot for your sharing Shankar Lal

@Rami Sihwal, U r welcome. Let me know if it solves yr curiosity. In case of any doubts, feel free to discuss. Best wishes. Regards, drslal

Kaushik Kumar Panigrahi

Correlation Analysis:

Abdelkader Mohamed Elsayed

Following