Entropy is considered as a measure of the uncertainty of a random variable. It is formally defined (Entropy of a discrete random variable X) as the negative expected value of the logarithm of the probability density function of the random variable X. Its original treatment comes from the axiomatic proof of its form in Shannon's 1948 paper (A mathematical theory of communication). The basic principle behind the Entropy as a concept is in using the inverse of the pmf 1/p(x) to account for the amount of surprise, and hence information one receives when the event happens. The more probable an event is to occur (I know to occur), the less information would be correlated with that event. So I wonder, why the inverse of the pmf is used to account for the amount if surprise? Wouldn't any other function do? It seems that the axiomatic treatment overwhelmed the intuition of the mathematical functions used to account for the true measure of information. I need your comment on this perspective.