I've been studying High Order Factorization Machines for a couple of days now.
I started with Rendle's first paper (2010 : http://www.algo.uni-konstanz.de/members/rendle/pdf/Rendle2010FM.pdf ) before going to this more recent paper that I found interesting because it shed some light on kernels and polynomial networks at the same time (2016 : https://arxiv.org/pdf/1607.08810.pdf).
I'm trying to implement his coordinate descent algorithm for factorization machines (second order, m=2).
But there's a thing that bothers me : for the vector omega, the author says (in 9.3.) : "w is a vector of first order weights, estimated from training data." But I don't know exactly what he means by that. How can we estimate them from training data ?
I had an idea but I don't think that it's good. It would be like :
1/ we suppose that we have a linear model, y = , we perform some gradient descent on this and get back the vector of weights w.
2/ we suppose (that step is what's present in the paper) that y=y_{A^2} and perform coordinate descent on this to retrieve the matrix P.
Could anyone explain to me how should I proceed ?
Thanks !