Consider a sequence ABCD where each letters are independent.If we assume equal probability of ¼, clearly the maximum Shannon entropy this sequence can have is log2(4)= 2 bits. Thus average entropy per symbol is 0.5 bits. Now let's say we have a probability distribution of A=0.28,B=0.42,C=0.12 and D=0.18, then Shannon entropy of the sequence is approximately 1.85 bits. Average entropy is 0.46 bits. But to calculate information at each position I subtracted observed entropy from maximum possible entropy. Thus information of each letters is:
I(A) = 0.5-0.28*log2(1/0.28) = -0.01422035 similarly,
I(B) = -0.02564628
I(C)= 0.1329328
I(D)= 0.05469239
Total information the sequence has is I(A)+I(B)+I(C)+I(D)=approximately 0.15 bits as expected (2-1.85 bits). But what I find interesting is the negative values of information letter A and B has. But information is by definition a non-negative quantity. Have I missed something or used the concept in wrong way? Thanks in advance.