Such automation is a great and promising idea. Parafac seems relatively well suited for it, due to its uniqueness. I think you should indeed automate the data cleaning steps like handling scatter and outliers. Once the data are cleaned, you should automate model selection, i.e., number of components, their identification (as to sign and ordering), but also a remedy or at least alarm once you obtain degenerate solutions (i.e., if degeneracies are possible at all, because possibly you use nonnegativity constraints throughout).
Good suggestions thanks. I like the idea of also assessing degeneracy. Actually maybe that could be generalized to a 'happy-meter'. Once the model is decided it would be nice to know if you can actually trust it. Something that tells you "this is the best model but we are only 60% happy about it"
In itself a good idea, but then give some automated suggestions on what could be the problem (e.g. you might be using the wrong model, you took too many components, etc). Giving a remedy would even be better of course, e.g. in line with Paolo Giordani's talk at TRICAP2015.
Yes. a remedy would be nice, but I actually think that is what the automation is doing. If we automate scatter handling, inner filter effects, outlier detection, choosing number of components. Then in essence, we are giving a remedy to the named problems.
Still, we may end up with a bad model so we need a number to tell us if the model is nice. Then later we can find remedies for the 'new' bad models, but first we need to know that there is a problem and then after identify what it is.
By the way, two factor degeneracy is not really a big problem in fluorescence data I would say? It is in PARAFAC analysis in general, but not on EEM data.
What to you mean by changing the number of components Vipavee. Do you mean that it should be easy to see both a three and a five component model? That is actually possible in some software packages, I believe.
I am sorry that I have a little confusing about the automate project. Does it mean there will be a independent software for DOM PARAFAC or just a code for the function integrating into current mainstream fluorescence spectroscopies (e.g., Horiba Auqalog)? or the same toolbox for matlab? What I am very interested in the project, is that whether the "outlet" of this automate project is a "easy-go" and “buttom one by one" (user-firendly) design, so that scientists just need focus on the choosing of parameters and decide the final results.
We used Auqalog to extract EEMs to do inner filter effects removal, but other steps for PARAFAC anaylsis still need manuual operation. I have to say your prospective work will be very amazing and wonderful. Importantly, I think the automate project should also have a strict guide for data preparing for your automate operation. because I think the abusing of PARAFAC in DOM analysis is not rare currently, sometimes we could find out some works even didn't discern the DOM sample sources or sample number. For exampel, until now I still don't know how to obtain 5 componenets just from less than 6 DOM samples.
We are looking forward your project update. Great job!
Good comments thanks. This is science stuff, so I am not thinking about where to implement this. That is not at first important, but it is of course important in the end. But before thinking about software, we need to develop the science. So I am guessing it will go to PLS_Toolbox, our free tools at www.models.life.ku.dk and DrEEM or whatever. But right now I am focussed on developing the methods mainly. A strict guide is nice. We already have those in our EEMizer paper. That defines what you need to make nice PARAFAC models.
I am not familiar with PARAFAC, but I think I will use it later, For the automation, maybe, flexibility in the dimension of dataset and type of input files need to be supported. Flexibility to edit the final graph/results appearance (artwork) also nice to be included.
Thank you for your project, it will be useful for many researchers.
Good suggestions. I think we already have flexibility in input formats and dimensions but that is definitely important. The visualization part is definitely also maybe the most important part.