I have been exploring separately machine learning, markov chains and statistical mechanics/information theory. All these topics seem to be extremely interconnected, but since they come from very different disciplines is hard to get some answers.
The concrete question is:
Say you have a phenomena F with a entropy quantity S. Now let the model M be a "fairly good and sufficiently compact" model of F. How much information does the model contain?
For example, a model could be F ~= M = x^2. That means x^2 contains information about F, but how would you measure this?
Trying to understand this. Thanks in advance!