Proximity (or similarity) matrix will allow you to compare any number of feature vector. it will also enable you to calculate mean similarity or a similarity threshold.
In addition Abass Olaode and Nabil el Malki answers, Visualizing Data with Pairs Plots might help you to see relationship between features.
You can check this article: https://towardsdatascience.com/visualizing-data-with-pair-plots-in-python-f228cf529166
Also, If you want to evaluate the relation between features you need to check Correlation between features. Feature Selection process can make the prediction more accurate.
You can look at this article: https://towardsdatascience.com/why-feature-correlation-matters-a-lot-847e8ba439c4
Thank you dear @Abass Olaode, @Nabil el Malki and @Seda Kul for sharing your knowledge. Basically, I am trying to figure how related features, like series featues in malware, can be useful in feature selection process.
Basically, you are trying to find how much redundancy there is in a given feature subset S. In the literature, there are two main approaches widely used to quantify the redundancy in a feature subset S. 1) Quantifying the redundancy of S without considering an objective concept, and 2) Quantifying the redundancy of S considering an objective concept. In the first case, the aim is only to measure the degree of correlation, dependence, similarity, or association (commonly in pairs) among the features in S. While, in the second case, the aim is to quantify the relationship among features in S considering also a specific task or objective concept for which these features could be considered redundant. In your case, given that you are performing a supervised classification task, your objective concept will be the class labels.
The notion of feature redundancy is usually considered in terms of feature correlation; this correlation can be quantified using any measure of similarity, dependency or association among the features, and it is widely accepted that two features are redundant to each other if their values are highly correlated. Therefore, two features f1 and f2 are redundant if corr(f1, f2) > beta, being beta a predefined threshold.
Some useful works on this regard can be found in:
- Yu, L., Liu, H., 2004. Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224.
- Auffarth, B., López, M., Cerquides, J., 2010. Comparison of redundancy and relevance measures for feature selection in tissue classification of CT images. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6171 LNAI, 248–262.
The best way for findong the similarity between two feature is done by building TOM matrix (Topological Overlap Matrix). High value means high similarity. The features with 95% of similarity are most probably redundant.
According to Harrel's guidelines[1], you can perform the following steps to detect the correlation between two or more features:
1. Correlation Analysis: analyze the correlation between each pair of features using Spearman rank correlation test;
2. Independence Analysis: analyze independent features. For each of binary or nominal features, you can use Chi-squared test of independence to analyze the statistical dependence of the feature from the other features.
3. Redundancy Analysis: analyze features that can be predicted by the combination of other features. You can use the redun function in the rms R package.
[1]: F. E. Harrell. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, 2001.
For more details, you can refer to my papers as follows:
Article The Impact of Mislabeled Changes by SZZ on Just-in-Time Defe...
Article Chaff from the Wheat: Characterizing and Determining Valid Bug Reports
We have proposed a new unsupervised feature selection method, where the relationships between features are defined according to their ability to discriminate clusters, based on subspace learning concept (the projected clusters properties)
Article Unsupervised graph-based feature selection via subspace and ...
We propose a graph-based feature selection method to which can effectively measure and evaluate the features relation (A graph theoretic approach for unsupervised feature selection ).
In the first step of the proposed method the feature set is represented as a weighted graph in which each node in the graph denotes a feature and each edge weight indicates the similarity value between its corresponding features. In the second step, the features are divided into several clusters using a community detection method. The goal of features clustering is to group most correlated features into the same cluster. In the third step a novel algorithm based on node centrality is proposed to select the best representatives of features from each cluster.
A preliminary step for all graph-based methods is to establish a graph over the training data. Thus, we attempt to model the feature selection problem using a graph theoretic representation. In this work, we have used well-known Pearson product-moment correlation coefficient to measure similarity between different features of a given training set.
in the second step, quite different from existing feature clustering algorithms, a community detection method is applied to cluster the features. Detection of communities in the weighted graph is significant for understanding the graph structures and analysis of feature relation.
And finally in the third step, relevant and influential features from each cluster is identified using node centrality.