Hello,

a seemingly simple design question: The aim is to visualize the dependence of A and B by connecting A and B by a straight line (possibly with a label). The design options are: line type, line strength, text or symbolic label.

How would you visualize the "significance" and/or "strength" of the dependence?

Details:

- A and B are either independent (no line) or dependent. They are considered dependent if the likelihood of being independent (the p-value / "significance") is small (which corresponds in each setting to a certain value of a test statistic).

- The "strength" of dependence of A and B might be given on a scale, e.g. [-1,1] if one considers classical correlation.

(The use of colour is a further design option, which breaks down in black and white print. Therefore it was excluded.)

### all below can be skipped, it provides only further details for the reader interested in the background of the question ###

The detection of dependence and its quantification are usually separate procedures, thus a mixture of both might be confusing...

Background:

Apart from many other new contributions the paper arXiv:1712.06532

introduces a visualization scheme for higher order dependencies (including consistent estimators for the dependence structure).

Based on feedback there seems to be a tendency to interpret the method/visualization by a wrong intuition (rather than by its description given in the paper)... so I wonder if this can be moderated by an improved visualization.

If you want to test your intuition use in R:

install.packages("multivariance")

library(multivariance)

dependence.structure(dep_struct_several_26_100,alpha = 0.001)

dependence.structure(dep_struct_star_9_100,alpha = 0.01)

dependence.structure(dep_struct_ring_15_100,alpha = 0.01)

# which performs dependence structure detections on sample datasets

The current visualization does NOT include the "strength" of dependence, but that's what some seem to believe to see.

The paper is concerned with dependencies of higher order, thus it is beyond the simple initial example of this question. But still, it depicts dependencies by lines and uses as a label usually the value of the test statistic. Redundancy is introduced by using colour, line type and in certain cases also the label to denote the order of dependence.

It seems that using the value of the test statistic as label causes irritation. The fastest detection method is based on conservative tests, in this setting there is a one-to-one correspondence (independent of sample sizes and marginal distributions) between the value of the test statistic and the p-value - thus it provides a very reasonable label (for the educated user). In general the value of the test statistic gives only a rough indication of the significance.

A further comment to the distinction between "significance" and "strength": In the paper also several variants of correlation-like measures are introduced, which are just scaled version of the test statistics. Thus (for a fixed sample size and fixed marginals) there is also a one-to-one correspondence between the "strength" and the conservative "significance". These measures also satisfy certain dependence measure axioms. But one should keep in mind that these axioms are not sufficient to provide a sensible interpretation of different (or identical) values of the "strength" in general (e.g., when varying the marginal distributions). ... that's why currently all methods are based on "significance".

More Björn Böttcher's questions See All
Similar questions and discussions