How do you measure the information contained in a model?

02 February 2014 4 211 Report

I have been exploring separately machine learning, markov chains and statistical mechanics/information theory. All these topics seem to be extremely interconnected, but since they come from very different disciplines is hard to get some answers.

The concrete question is:

Say you have a phenomena F with a entropy quantity S. Now let the model M be a "fairly good and sufficiently compact" model of F. How much information does the model contain?

For example, a model could be F ~= M = x^2. That means x^2 contains information about F, but how would you measure this?

Trying to understand this. Thanks in advance!

Francisco Lazaro

Assume you have two probability distributions X and Y.

X is the probability distribution of the phenomenon you mentioned. Y is an approximation / model of X.

The Kullback-Leibler divergence or relative entropy between X and Y, D_{KL}(X|Y), gives you the information loss when Y is used to approximate X.

If X is an information source with entropy H(X) and you use Y to model the source (e.g. because you do not have X), D_{KL}(X|Y) gives you the additional number of bits you will need to describe X. Hence you would need H(X) + D_{KL}(X|Y) bits

You can read more about the Kullback Leibler divergence in :

Cover, Thomas M., and Joy A. Thomas. Elements of information theory. John Wiley & Sons, 2012

If you questions is how much information Y "contains" about X you can look at the mutual information I(X,Y) which is the relative entropy between the joint distribution p(X,Y) and the product of the distributions p(X) p(Y).

I hope this answers your question.

Cristian Garcia

Thanks for the reference. After posting I thought of cross entropy too, but now a new question arrises about how to get the probability density function of a machine learning model? Because the model usually only gives you the expected value.

Neil McRoberts

Hi,

One relatively new approach is to use a codelength function as the measure of the information content of the data and the model. The Minimum Description Length principle then lets you select, objectively, the model which minimizes information loss while avoiding arbitrary model complexity. Peter Grunwald's book is a good, comprehensive introduction to MDL which lays out the links with standard methods for model evaluation and also classical information theory:

http://homepages.cwi.nl/~pdg/book/book.html

Cristian Garcia

Neil,

VERY nice book: readable, friendly, very pedagogic; there are few books like this. I will certainly read it.

Thanks a lot!

Feedback defines the constitution of an organism?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Stability of the Solar System: Insights from Einstein’s Equations ?

Is it possible to plot the atom-projected band structure using GPAW?

Should I include H atom into C3N5 when i am doing DFT modelling?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Are there any good simple systems or platforms to recommend?

Measuring the Intelligence of a Species?

"A Markov-like Model for Patient Progression"?