How can I get the topics in a dynamic text?

More Donni Darsh's questions See All

How to evaluate results of two different topic modeling methods?

Hi All, I‘m looking for an approach for evaluating results of topic identification. I applied two different methods on text dataset to extract topics. The results are list of topics, each topic...

04 May 2018 4,586 0 View

Is there any tool for technicalTerminologies identification?

Hi, I'm looking for a free tool to recognize the terminology concepts in technical domains such as computer science and engineering. Is there any available dictionary, gold standard or such a...

31 December 2017 3,943 3 View

What is the best free tool for spelling/error correction?

I'm working on Scientific texts and some headings/phrases in the texts are not correct? some Examples are: DESCRIPTIOON of FIGUREES EXPERIMENTALEXAMPLES THE BACKGROUNDD...

10 November 2017 4,863 2 View

Medical Entity Recognition?

Hi, I'm looking for a medical Entity Recognition tool to extract medical terms from a scientific texts. any help? thank you

08 September 2017 9,511 1 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Geotechnical Engineering (Proceedings of the ICE) time review?

Hello everyone, I recently submitted an article to Geotechnical Engineering (Proceedings of the ICE), and the current status has been listed as "EiC Pre-assessment: Ready" for the past 20 days. I...

10 August 2024 6,493 1 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Noha Elprince

You may like to read about text mining and specifically how to calculate the tf–idf frequency–inverse document frequency), which is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. So, you can a have ranking score(s) for all the documents in your corpus relevant to your query search: "nano-technology" for example.

To learn more about tf-idf score, you may like to read:

http://en.wikipedia.org/wiki/Tf%E2%80%93idf

http://pyevolve.sourceforge.net/wordpress/?p=1589

For implementation of tf-idf in Python, you may like to check this URL:

http://aimotion.blogspot.ca/2011/12/machine-learning-with-python-meeting-tf.html

Donni Darsh

Hi Noha,

thank you.

I know much more about tf-idf, pre-processing filtering, nlp, ... so on. The question is about identification of topics in the results of search query? How Do I know previously number of topics/clusters?

I would like to use the clustering algorithms/topic modeling in real application to deal with a dynamic data? "the input parameters of algorithm should be changed according to the nature of dataset".

any Idea??

Thanks;

Abdul Wahid

Dear Donni,

I understand you have two separate problems. The first is that your data is too large or your data just keep comming (dynamic). Secondly you want to determine number of clusters.

To address the first problem, I think you should have a look at the stream/ online text processing with LDA.

To address the second problem, checkout hierarchical LDA.

Following links might help you out.

http://www.cs.princeton.edu/~blei/topicmodeling.html

https://github.com/jessykate/streamLDA

Kind Regards

Hallo Abdul,

Thank you. Of course I have a large dataset(big Repository), but I want to identify the topics only in the data which I search about. so different search==>different topics.

Do you think that the estimation of topic numbers by hierarchical LDA is good in practice?

I'm searching about topic identification method that really helps me to use it in the real application.

Best Regards,

Donni

Sorry, I misunderstood your question.

You may like to read about "subtractive clustering" that determine dynamically the number of clusters. I attach a paper that uses it:

I hope this may help,

Noha

Article Clustering: A neural network approach

Kasper Christensen

It is an interesting problem...

Do you know what are the potential topics that might show in your datastream? Is it always texts about cars or is it cars today and bicycles tomorrow?

Do your data come with any metadata, so you potentially could train a classifier?

Thushari Atapattu

Found this interesting article on "Real-time topic modeling of micro-blogs' - http://www.oracle.com/technetwork/articles/java/micro-1925135.html

Hope you will find this useful