What would be considered the least number of documents for training an LDA/SLDA topic model? Is a corpus of 200 documents large enough?

More Davide Marengo's questions See All

Incubation of antioxidants with viral inoculum experiment?

I would like to understand whether a specific frequency of light (in the visible spectrum) inactivates a viral species by oxidative process through ROS formation or by a conformational change of...

05 May 2024 2,109 0 View

Polymers pyrolysis using "PYROLYSIS REACTION"?

Hello everyone, I am trying to use the new type of reaction present in Aspen plus V14, known as "pyrolysis." I have tried to simulate pyrolysis of HDPE, by entering the kinetics suggested by...

03 April 2024 2,086 0 View

Find a controller for a feedback-loop?

My team and I are working on an automation project to stabilize the altitude of an airplane. However, we've encountered an issue that seems unsolvable for us, at least for the moment, in the final...

03 March 2024 7,659 0 View

How to transfer data between host PC and Win95 in a Virtual Box?

I have installed Windows 95 in a Virtual Box on a Windows 11 PC. How can I import and export data in Win 95?

13 February 2024 5,427 2 View

Is there any wearable that provides PPG data for further processing?

Hi all, I was wondering if there's any wearable easy to buy where I can have access to the PPG data in realtime for further processing. Does anyone know about this? Thank you very much in...

03 November 2023 9,801 0 View

How do I induce heavy metals stress in potted plants?

I am currently researching the stress physiology of photosynthesis. I have previously worked with hydroponic systems using heavy metals at concentrations around tens of microMolars. Now, I would...

02 November 2023 4,163 1 View

How can I evaluate a new HRV biofeedback device?

Hi all, We developed a device for HRV biofeedback training. Which simple procedure do you recommend to assess it? Thanks! Fernando

15 August 2023 7,302 1 View

APARC + ASEG: where to find the nifti file and nomenclature of the ROIs?

Dear neuroimaging experts, I am trying to find the APARC+ASEG nifti file used by freesurfer, along with a file / table with the nomenclature of the ROIs (i.e., list of ROIs with labels). Could...

29 May 2023 9,239 1 View

Which is the impact of Yoga or Pranayama practices over HRV?

Hi all, I was wondering if there is any study about Yoga or Pranayama practices and its benefits over heart rate variability. It's very important for my current study. Any help is greatly...

14 May 2023 8,633 4 View

Which protocol do you recommend for testing an HRV biofeedback device?

Dear all, I have an HRV biofeedback device that measures RMSSD, and want to test it for reducing anxiety, depression and stress. Which protocol do you recommend to implement? Thanks!

01 April 2023 1,708 1 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Dear fellow researchers, I am currently working on a paper where I need to provide a reliable reference that defines and distinguishes between 3D mesh models and 3D city models. Although I am...

06 August 2024 9,986 2 View

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

I am working on Abaqus/Explicit(Quasistatic ) for the deformation of the auxetic structure model. Please explain how the plastic input value should be considered from the true stress-strain curve...

05 August 2024 454 3 View

What are the shear and normal stiffness values of an LLDPE liner in 3D numerical modeling of a stockpile?

I am seeking experimental or applicable data for the liner (LLDPE) interface in FLAC3D numerical modeling of a large stockpile. Could you please recommend suitable references? The preferred data...

05 August 2024 3,665 0 View

Marion G Ceruti

Ciao Davide,

I am not an expert in LDA/SLDA but I would think it would depend on the size of the documents, among other variables.

For better help with this I recommend my colleague, Emily Medina, who also is on ResearchGate.

Cordiali saluti,

Marion

Colin Layfield

That's a good question. I have seen various reports on this but I don't think there is a magic number. I think the general rule of thumb is the more the better (my background is LSA, not LDA). The documents themselves sound like they have a decent size and you may also want to consider, perhaps, segmenting those into smaller documents. Would be an interesting experiment; I have seen research that has taken approaches like and come up with interesting results.

Taimoor Khan

I am currently conducting certain evaluations on different extensions of LDA (the knowledge based in particular). I have two types of data with one having documents in hundreds and the other in thousands.

The one with thousands of documents are producing more reliable values as per human evaluations. In case of comparing different models (through topic coherence) it tell them apart to a good degree, where they tends to converge when the documents are in hundreds.

It doesn't take long either with base-line LDA. Yes with knowledge-based, performance is a concern.

Khushboo Thaker

Number of documents required for training - is also related to the number of topics your LDA is going to learn. Intuitively if you have 200 topics and only 1 or 2 samples representing those topic it would be hard for LDA to learn the distribution across topics. If you will look at research building the simulation data to test their bayesian models, they clearly reveal that number of documents per topic is important.

Thang Hoang Ta

I am also working with the LDA model with 3400 quotations, considered a quotation as a document. I feel like it is very difficult to differentiate topics with a small corpus and small size of document. I am looking forward to a better solution.

Abiodun Abdullahi Solanke

I read this literature because I was also curious about understanding the best K-topic number to use on LDA model. I have not applied their method though, but, it's close to what I've been trying.

Article A heuristic approach to determine an appropriate number of t...

Bibaswan Basu

You could look at this: https://docs.aws.amazon.com/comprehend/latest/dg/topic-modeling.html

Bethany Gray

Sample Size for Latent Dirichlet Allocation of Constructed-Response Items (Page 263 in Quantitative Psychology -- Marie Wiberg, Dylan Molenaar, Jorge González, Ulf Böckenholt, Jee-SeonKim) -- Has a good table with cutoff values