Topic modelling is learned on words. Can it be learned on n-grams?

More Tshephisho Joseph Sefara's questions See All

Are there any data augmentation techniques for speech data set?

Any suggestions in this regard will be appreciated.Thanks!

06 July 2018 788 0 View

How to evaluate a text-to-speech program?

Standard methodology

05 June 2018 3,117 0 View

What are the must-read papers on text-to-speech synthesis?

List of references or top authors in this field.

05 June 2018 8,701 2 View

Is it important to one-hot encode data before training with convolutional neural networks?

Lets say the data is in a form of text. I'm looking for ways or representation that can be used before throwing my data to a convolutional neural network.

05 June 2018 8,367 1 View

Which neural network type is best for speech recognition and speech synthesis?

Give reference if possible

05 June 2018 4,109 0 View

How to combine convolutional neural network with stemming to learn word formation?

CNN or LSTM or any deep learning algorithm You can give few Keras implementation

05 June 2018 9,496 3 View

How is deep learning different from convolutional neural networks?

Is a CNN a type of deep learning?

05 June 2018 2,981 0 View

How to improve performance of a topic modelling system when testing on misspelled words or new words ?

Should I also use misspelled words during training? I'm looking for methodologies or with relevant research papers.

05 June 2018 525 1 View

Are there any tutorials for text-to-speech synthesis using neural networks?

or deep learning approach

05 June 2018 5,026 1 View

What are the steps to make text-to-speech synthesis (TTS) system from the scratch in deep learning?

I'm looking for best references. From scratch means training a voice from sample recordings.

05 June 2018 1,270 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

Request Python code?

Request Python code from this article : Gender equity of authorship in pulmonary medicine over the past decade. THANKS!

08 August 2024 6,242 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

Simon Marillet Popular answer

Topic modelling can be based on whatever unit of text is relevant for you. Take LDA (https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) for instance: instead of modelling the distribution of words for a topic, one can model the distribution of n-grams for a topic. The same is true for LSA/LSI (https://en.wikipedia.org/wiki/Latent_semantic_analysis) and NMF (https://en.wikipedia.org/wiki/Non-negative_matrix_factorization#Text_mining): instead of a term - document matrix, you can build a a "n-gram - document" matrix. Then the computations remain the same.

In practice, Sklearn's vectorizers (http://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_extraction.text) can work at the n-gram level through the ngram_range parameter. The resulting matrix can be used as an input for any topic modelling procedure (see http://scikit-learn.org/stable/auto_examples/applications/plot_topics_extraction_with_nmf_lda.html#sphx-glr-auto-examples-applications-plot-topics-extraction-with-nmf-lda-py).

Simon Marillet

Peter Wlodarczak

Yes of course. I have been using n-grams for text analysis using LingPipe which worked really well for me. You might consider tokenization as pre-processing step.

Eslam A. Amer

Certainly, you may consider the co-occurrence between words as heuristic patterns to topic modeling