I need to establish if there is a link between 2 columns from two different datasets with one matching column, where;

Dataset1: bipartite: (M, DS)

M DS

m23 ds3

m23 ds67

m54 ds325

... ...

Dataset2: tripartite: (M, G, DG)

M G DG

m23 g6 dg32

m23 g8 dg1

m54 g32 dg65

... ... ...

These 2 datasets have one column in common(i.e., **M**), and the relationship among the elements is shown below:

```

M ----affects----> G

M ----causes-----> DS

DG ----affects----> M

```

Primary Goal: To calculate the probability of a possible link/edge that might exist between indirectly related columns(eg. **DG** and **DS**) via the common column(**M**).

So, for a given list of DS entries, how to find the probability of the existence of a link/edge between

selected DS, and all the other DGs

```

DS DG

```

If DS; (ds3, ds67) were selected, the output should be like this:

element1 - element2 - probability/statistical value to signify the existence of direct relationship OR link.

```

ds3 - dg32 - 100% (common M value)

ds3 - dg1 - 100% (common M value)

ds3 - dg65 - 43.66%

---

ds67 - dg32 - 100% (common M value)

ds67 - dg1 - 100% (common M value)

ds67 - dg65 - 55.12%

```

I am trying to code this in Java, but Python based solutions can work too.

I am sorry I am not too familiar with graph theory, a little descriptive solutions would be really appreciated.

Thanks.

More Ayushman Kumar Banerjee's questions See All
Similar questions and discussions