Intuitive visualization of "significance" and "strength" of dependence?

22 August 2019 5 647 Report

Hello,

a seemingly simple design question: The aim is to visualize the dependence of A and B by connecting A and B by a straight line (possibly with a label). The design options are: line type, line strength, text or symbolic label.

How would you visualize the "significance" and/or "strength" of the dependence?

Details:

- A and B are either independent (no line) or dependent. They are considered dependent if the likelihood of being independent (the p-value / "significance") is small (which corresponds in each setting to a certain value of a test statistic).

- The "strength" of dependence of A and B might be given on a scale, e.g. [-1,1] if one considers classical correlation.

(The use of colour is a further design option, which breaks down in black and white print. Therefore it was excluded.)

### all below can be skipped, it provides only further details for the reader interested in the background of the question ###

The detection of dependence and its quantification are usually separate procedures, thus a mixture of both might be confusing...

Background:

Apart from many other new contributions the paper arXiv:1712.06532

introduces a visualization scheme for higher order dependencies (including consistent estimators for the dependence structure).

Based on feedback there seems to be a tendency to interpret the method/visualization by a wrong intuition (rather than by its description given in the paper)... so I wonder if this can be moderated by an improved visualization.

If you want to test your intuition use in R:

install.packages("multivariance")

library(multivariance)

dependence.structure(dep_struct_several_26_100,alpha = 0.001)

dependence.structure(dep_struct_star_9_100,alpha = 0.01)

dependence.structure(dep_struct_ring_15_100,alpha = 0.01)

# which performs dependence structure detections on sample datasets

The current visualization does NOT include the "strength" of dependence, but that's what some seem to believe to see.

The paper is concerned with dependencies of higher order, thus it is beyond the simple initial example of this question. But still, it depicts dependencies by lines and uses as a label usually the value of the test statistic. Redundancy is introduced by using colour, line type and in certain cases also the label to denote the order of dependence.

It seems that using the value of the test statistic as label causes irritation. The fastest detection method is based on conservative tests, in this setting there is a one-to-one correspondence (independent of sample sizes and marginal distributions) between the value of the test statistic and the p-value - thus it provides a very reasonable label (for the educated user). In general the value of the test statistic gives only a rough indication of the significance.

A further comment to the distinction between "significance" and "strength": In the paper also several variants of correlation-like measures are introduced, which are just scaled version of the test statistics. Thus (for a fixed sample size and fixed marginals) there is also a one-to-one correspondence between the "strength" and the conservative "significance". These measures also satisfy certain dependence measure axioms. But one should keep in mind that these axioms are not sufficient to provide a sensible interpretation of different (or identical) values of the "strength" in general (e.g., when varying the marginal distributions). ... that's why currently all methods are based on "significance".

Albert Vexler

I would suggest our paper " Multi-Panel Kendall Plot in Light of an ROC Curve Analysis Applied to Measuring Dependence "

Hume F. Winzar

An interesting question, Albert.

I'm not going to pretend that I fully understand all that's in the Björn Böttcher paper that you refer to. Albert Vexler 's general index of dependence looks an excellent method for testing your proposed relationships, but it doesn't give the visualisation that you propose. Here is one idea (and I'm happy to accept criticism from any of our readers).

It seems to me that you should regard the measurement/estimation of dependence as quite distinct from the visualisation of that dependence. Run your analysis and visualisation as two separate steps. So you have an estimate of whether and how much A is correlated with B; and then you visualise that correlation.

Your proposed visualisation sounds a lot like a straightforward graph that might be used for visualising a Network. You have Nodes A and B with an edge (link or arrow) between the two nodes. The edge may be directed (an arrow indicating direction of communication or influence) or undirected. The colour and thickness of the edge can indicate the strength of the relationship.

Usually you need two sets of data:

a list of nodes with ID's and Names
a list of edges with Node ID's for Start and End, and size or strength measures.

There are several Network analysis and visualisation packages for R. These two tutorials will get you started:

https://kateto.net/network-visualization
https://www.jessesadler.com/post/network-analysis-with-r/

If you really want to make your visualisation look pretty, then use Gephi: https://gephi.org/

Björn Böttcher

Dear Albert, dear Hume,

thank you for your comments.

Albert's method seems to provide a tool to asses bivariate dependence via a specific visualization method. But since it is tied to a specific "test" it is not really what I am interested in - although there have been also other papers which suggested to use plots as labels for the edges in a graph visualizing dependence...

"Your proposed visualisation sounds a lot like a straightforward graph that might be used for visualising a Network."

Absolutely, and in this setting the visualization of "strength" via colour or thickness seems standard. Thus a restriction to "strength" avoids confusions, but what if we also want to illustrate the "significance"...

Sometimes ., *, **, *** are used to visualize (or label) p-value ranges. This method is maybe not directly intuitive, but using a symbol which obviously requires an explanatory text also might avoid wrong intuitions...

-> I wonder if there are other ways used.

Formally, if "no line" (interpretable as line of thickness 0) indicates that a dependence is not significant, a line thickness corresponding to significance seems natural - but maybe not intuitive... (intuition being maybe based on previous knowledge of correlation networks).

A trivial solution which avoids misconceptions would be to always add a legend...

-> Thus once again, I wonder if there are other ways used.

Best wishes,

Björn

Albert Vexler

Thank you, folks. It is a very nice discussion. It seems very reasonable to plot the likelihood f(X,Y) versus f_x(X)f_y(Y) or with respect the corresponding distribution functions. There are too many questions here, e.g. regarding positive, negative etc. characteristics of dependence. There is one aspect of multivariate dependence in our paper. I just wanted to say “correlation” is not “dependence” in general. Also I would suggest our papers “Expected P-values in Light of an ROC Curve Analysis Applied to Optimal Multiple Testing Procedures” and “To t-Test or not To t-test?: a P-Values-based Point of View in the ROC Curve Framework” in the context of the nice Björn’s comments regarding the p-value based approach. The paper “Dependence and independence: Structure and inference” can be interesting too.

Sumra haleem Shaikh

go to,paper, that will definitively help you in this regard

Publicly accessible and curated database with all bacterial and fungal genomes?

Where can I find experimentally determined coefficients of restitution for impacts between plastic metal spheres and hard/soft plates?

LLM hallucination detection?

Do you remember Siewert?

Best practice on measure environmental impact per revenue?

SmartPLS 4 - Consistent PLS algorithm or regular PLS algorithm?

What is the reaction between palladium and molten sodium peroxide?

How to calculate the relative roll angle (Polarization angle) between a LEO Satellite and Az-El-Ground Station?

Does anybody have experience with intravenous injections in P11 neonatal rats?

Can Metal-Assisted Chemical Etching be applied to etch though a 4" silicon wafer?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?