I want to create an algorithm to auto-populate the CPTs using some trial runs of a Monte-Carlo simulation to determine the values in all the cross-state cells of the Conditional Probability Table used in the Bayes Inference.
Well, sort of. For example lets say I have a simple network with three parent nodes X1, X2 and X3 and further suppose these are discrete nodes with 3, 4 and 3 states each, and the child node Y has 4 states. So the connecting CPT is a matrix with 36 rows corresponding to the combinations of X1, X2 and X3 and 4 columns corresponding to 4-child states. Suppose further, I have access to experimental or external simulator data showing the results of Y for different combinations of X1, X2 and X3.
It seems to me I should be able to calculate (back-fit) the coefficients numerically from a set of simultaneous algebraic equations to calculate the 432 coefficients (cross states) that are needed to populate the CPT values.
My question is what is the form of the set of simultaneous algebraic equations to solve for the 432 CPT coefficients give a sufficient number of provided Xs and Ys.
Sorry, I don't see it as a Bayesian probability problem, but rather a combinatorial (Nodes, Sinks) network problem, where you want to find the Arcs representing the coefficients. Algebraic equations you mention come from the equilibrium state; Flow in = Flow out, at every node.
Yes I agree and that is what I trying to do in a very general way with "n" ins and one out. Although from my view, there are only n-arcs for n-parents to a single child (i.e. the number arc is equal to the number of parent nodes). The number of cross-coefficients found in the CPT array however is the product of the number of states associated with the child times the sum of the states associated with each parent node attached to the single child node.
I was hoping someone had approached this issue previously and there might be a publication describing the construction of the algebraic equations for determining the discrete-states cross coefficients of the CPT array for "n" parents with arbitrary number of states for each parent and with one child having an arbitrary number of discrete states.
Furthermore, how many solutions for the child from the n-parents are needed (obtained from experiment or Monte Carlo runs of a simulation)? Although, I believe once I have constructed the set of algebraic equations, it will be obvious how many results are needed (ie, how many external "solutions" are needed, or putting it another way: how many simultaneous equations) to provide a well defined solution set.
I originally had felt that a utility could be develop to auto-populate a CPT from a sufficient number of Monte Carlo runs of a non-BN system that the BN was simulating as an alternative approach. I reasoned this should be a simple set of linear simultaneous equations where the evidence provided to the parents and the inference in the child node are treated as constants and the CPT cell values are the unknowns.
I derived the equations but unfortunately I had a mistake (adding the parent states rather than multiplying them), which made my derivation in error. I since found my error and now all is right with the world.