Topic analysis with rarely occurring topic and small document corpora. Which technique should I use?

More Massimiliano Grassi's questions See All

What is the correct way to calculate confidence intervals of an AUC obtained by merging/pooling predictions from different test sets?

I have one question regarding the CIs of the AUROC calculated merging/pooling the predictions coming from different test sets. In one analysis, we use a sort of nested cross-validation approach,...

08 September 2018 4,273 0 View

Is there a general and model-independent way of calculating prediction intervals in machine learning for regression task?

I’m training some supervised machine learning algorithm to perform the prediction of a continuous variable. I’m currently applying a nested cross-validation protocol (inner: LOOCV; outer: LOOCV;...

01 February 2018 8,903 5 View

Post-pre change or post scores in predicting treatment response give different cross-validated r-squared. Which should I consider?

I have trained a machine learning model to predict the outcome of a therapy based on some pretreatment information. Outcome is based on a questionnaire that is administered both before (pre) the...

01 February 2018 9,440 3 View

In cross-validation, which is the AUC population parameter I really want to estimate?

I’ve found a lot of different procedures to calculate the AUC confidence interval of a cross-validated model. it may sound quite theoretical but it is not clear to me which parameter these CI...

01 February 2017 6,086 7 View

Which Bootstrap for Confidence Interval of AUC with Leave-Pair-Out-Cross-Validation?

I have to calculate the CI of the AUC (Roc) for a series of classifiers (e.g. Lasso, Random Forest, SVM) learned using the same test dataset, in order to identify the best model for this problem...

11 December 2016 379 4 View

Recursive feature selection with cross-validation in the caret package (R): how is the final "best" feature set selected?

The rfe functions in the caret package allow to perform recursive feature selection (backward) with cross-validation. It is expected that the best features selected in each fold may differ, as...

08 September 2016 5,759 4 View

Does publication bias affect the meta-regression slope coefficient?

Hi everybody,differently than in meta-analyses, the effect of publication bias in meta-regression seems to me less severe for the slope coefficient, In my opinion, a bias in the slope coefficient...

04 May 2016 925 7 View

Power analysis in meta-regression?

Hi everyone,Is any package/code available to calculate power in meta-regression (random-effects, DL estimation)?None is available in R, as far as I know, but maybe it exists for another language...

04 May 2016 3,970 4 View

How is it correct to optimize a binary classifier output threshold with ROC and LPOCV?

Hello everyone and thank you in advance for you help! I'm building a screening tool with a machine learning algorithm. The model provides a probabilistic prediction (i.e. logistic regression,...

03 April 2016 4,468 7 View

Which Post-Hoc Strategy for a Poisson Repeated-Measure ANOVA?

Hi everyone and thank you for you advice.I'm running a two-way repeated-measure ANOVA, with two groups of subjects undergoing two different treatments (coded as: 0; 1) x 3 assessment times (coded...

07 August 2015 2,625 3 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

James Polichak

Read them and score them. This is a task best performed by the human brain.

Especially if you are dealing with medical reports and data, as your hospital affiliation suggests.

Alexey Andreevich Sorokin

I do not think topic analysis on a collection of 50 documents would give robust and stable results since LDA is generally an ill-posed task which has many solutions. Why can't you perform some soft clustering to detect "outliers" with peculiar topics. I am not an expert in topic modelling but the authors of this work suggest a general model that embraces LDA and PLSA (though I do not know whether it is used in practice). If I understand them properly, you could regularize the model to enforce the topics to be as diverse as possible but that is by no means a "black-box" procedure.

https://www.researchgate.net/profile/Konstantin_Vorontsov/publication/262314923_Robust_PLSA_performs_better_than_LDA/links/54e9f3480cf25ba91c814c64.pdf

Conference Paper Robust PLSA performs better than LDA

Massimiliano Grassi

Thank you very much Alexey! I will take a look at the paper!