Hi guys, I'm currently studying the mutual information (MI) applied in classification techniques. Could anyone here share with me on how to make a calculation which applied in simple dataset? Thank you in advance
how to prove that I(x, y ; z) = 0. how to prove it by listing all 8 cases of x, y and z as 000 to 111 and how to calculate the correlation coefficient?
I think there may be a mistake. Please correct me if I'm wrong, but Javier's probabilities P(x,y) are rather conditional probabilities, namely P(y|x). E.g. his P(a,1)=2/4 because given Event1=a, there are 2 of four possibilities that Event2=1.
However, P(x,y) should actually reflect the joint probability, namely P(a,1)=2/9. In full:
The Mutual Information of two discrete random variables XX and YY taking values in RXRX and RYRYrespectively is the difference between the expectation of logp(x,y)logp(x,y) (the logarithm of the joint probability of (X,Y)(X,Y)) and the expectation of log(p(x)p(y))log(p(x)p(y)) (which would be the joint probability for independent variables having marginal probabilities of p(x)p(x) and p(y)p(y)).
When the vectors constitute an iid sample of (X,Y)(X,Y), we can compute the mutual information of their joint empirical density. This is just the observed frequency: if a particular combination of values (x,y)(x,y) occurs k(x,y)k(x,y) times in the dataset out of nn total occurrences, the empirical density p^(x,y)p^(x,y)is just the ratio k(x,y)/nk(x,y)/n.
To compute expectations with respect to the empirical density, let's introduce some notation. Let RR("rows") be the set of distinct observed values of XX and CC ("columns") the set of distinct observed values of YY. For x∈Rx∈R and y∈Cy∈C, k(x,∗)=∑y∈Ck(x,y)k(x,∗)=∑y∈Ck(x,y) is the row sum, counting all elements of the dataset whose first component is xx. Likewise, k(∗,y)=∑x∈Rk(x,y)k(∗,y)=∑x∈Rk(x,y) is the column sum. These determine the marginal densities. Notice that the sum of all the k(x,y)k(x,y), the sum of all the k(x,∗)k(x,∗), and the sum of all the k(∗,y)k(∗,y) all count the elements of the dataset, whence they are all equal to nn. The mutual information equals