How to estimate parameters from a training set ?

06 June 2017 3 622 Report

I've been studying High Order Factorization Machines for a couple of days now.

I started with Rendle's first paper (2010 : http://www.algo.uni-konstanz.de/members/rendle/pdf/Rendle2010FM.pdf ) before going to this more recent paper that I found interesting because it shed some light on kernels and polynomial networks at the same time (2016 : https://arxiv.org/pdf/1607.08810.pdf).

I'm trying to implement his coordinate descent algorithm for factorization machines (second order, m=2).

But there's a thing that bothers me : for the vector omega, the author says (in 9.3.) : "w is a vector of first order weights, estimated from training data." But I don't know exactly what he means by that. How can we estimate them from training data ?

I had an idea but I don't think that it's good. It would be like :

1/ we suppose that we have a linear model, y = , we perform some gradient descent on this and get back the vector of weights w.

2/ we suppose (that step is what's present in the paper) that y=y_{A^2} and perform coordinate descent on this to retrieve the matrix P.

Could anyone explain to me how should I proceed ?

Thanks !

Himadri Nath Saha

http://homes.soic.indiana.edu/classes/spring2015/info/i529-cenksahi/HMM-par.pdf

http://www.cs.upc.edu/~belanche/Docencia/dm2/MATERIAL/1.%20ParamEst%20-%20Bias%20&%20Var/DMII-Class1.pdf

Rafael Berri

You should use genetic algorithm...

I've used in some works

https://www.researchgate.net/publication/264497921_A_Pattern_Recognition_System_for_Detecting_Use_of_Mobile_Phones_While_Driving

https://www.researchgate.net/publication/309742552_A_hybrid_vision_system_for_detecting_use_of_mobile_phones_while_driving

Conference Paper A Pattern Recognition System for Detecting Use of Mobile Pho...

Conference Paper A hybrid vision system for detecting use of mobile phones wh...

Stéphane Breton

Hi Mhamed,

You are very close to the solution. Your interpretation is not wrong, just imprecise yet.

According to the sections 2.1 and 2.2 from https://arxiv.org/pdf/1607.08810.pdf , two parameter structures have to be estimated by combined gradient descent (multi-objective optimization): weight vector w (not omega) and weight matrix W as PPT factorization.

By cross-reading additional references by the same author, you might leverage your uncertainties. In particular, I suggest that you follow these links:

http://mblondel.org/publications/mblondel-ecmlpkdd2015.pdf
https://arxiv.org/pdf/1705.07603.pdf
See algorithms 1 & 2 from https://arxiv.org/pdf/1607.07195.pdf

Good luck!

What precautions should be taken while handling S. aureus enterotoxin Type B in the lab?

How to understand this crystallographic phenomenon of low temperature crystals in zeolite?

Is there any machine to do real time pcr?

Help on understanding the implementation of Mori Tanaka method on MATLAB?

Do some Staphylococcus aureus strains have in vitro antimycobacterial activity ?

Is it necessary to attach fluorophore at 3' end in MIC RT PCR machine for Dual hybridization Probe designed especially for genotyping assay?

Please, what is the memory consumption of the Matlab function quad tree decomposition procedure [S = qtdecomp(I)] with respect to the input set I?

All math can be explained by iterator of code?

Can anyone help me with in providing the procedure for purification of GSTtagged protein by gst affinity chromatography without an automated machine?

Seeking MATLAB Code for Differential Evolution in Multilevel Thresholding Image Segmentation ?