How to compute document similarity against a document collection?

More Tirthankar Ghosal's questions See All

Asking for a vector, containing Hygromycin B resistance gene?

In my work, Hygromycin B resistance gene is require for screening. I tried to amplify this gene from the vector pIJ941 but failed. I will be thankful, if anyone help me out from this problem by...

19 June 2024 7,822 3 View

How can I plan for a PhD in tobacco related research in foreign universities?

Dear colleges, I'm a dental graduate (BDS) and did my master's degree in public health (MPH) with two years of experience. I'm working as a Scientist-B in one of the Indian Council of Medical...

02 July 2023 1,959 0 View

Dynamic model of PMA based WT system for Controlling CSI with GCC using VDCL?

A 3MW wind turbine with 0.69KV unit based on PMA is considered .The unit is equipped with AC/DC/AC converter assembly and 1.675/12.47 KV step up transformer which is connected to grid

27 August 2022 4,576 0 View

How to conduct Multiple correspondence analysis (MCA)?

Hi, To understand participants' life satisfaction, 5 questions have been asked. In each question, there are 5 options, from disagree to agree. Please suggest me how to create a single variable...

01 April 2022 7,014 5 View

Is there any relationship between land surface temparature and soil erosion? What do you think?

Recently, I have studied the impact of land surface temperature (LST) on an important soil erosion factor the cover management factor (C factor). My study was on the winter season. It shows a...

15 January 2022 4,397 3 View

Why in assembly cloning vector:insert is 1:4?

During setting up assembly cloning reaction, we need to add the insert and vector in ratio 1:4, which is mentioned in the protocol. But I don't know what is the reasons behind this ratio. I will...

28 June 2021 419 2 View

For a PhD scholar, what validity and reliability test should be good enough to call my tool a validated one?

For my Doctoral thesis, I am framing a quantitative questionnaire tool to collect healthcare seeking behavior of vulnerable tribal peoples. The content of questionnaire were synthesized from...

23 June 2021 8,853 4 View

I want to convert Methane gas to liquid Methanol using photocatalysis. Can you please suggest me a good paper which I can study ?

There are a number of publications regarding this, but none of them are clear and show the mechanism of gaseous Methane conversion to liquid Methanol.

06 January 2021 646 3 View

Choice among r2 and RMSE for a good model?

I have generated so many models having a different coefficient of determination (r2) and RMSE. In this analysis, it has been seen that it is not mandatory that a model having maximum r2 has a...

15 July 2020 6,239 4 View

Why we should ignore multicollinearity?

In many situations in practice, the explanatory variables (xi) may not remain independent due to various reasons. The situation where the explanatory variables are highly intercorrelated is...

07 July 2020 6,689 21 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is there an English Translation of the Carl Moller text: ZUR VERGLEICHENDEN ANATOMIE DER SILURIDEN?

I recently came across an anatomy text by Carl Moller that was published in 1915 but it is in German or Dutch neither of which I can understand. I would like to know if there is an English...

10 August 2024 4,347 1 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

František Dařena Popular answer

You might, for example, calculate the centroid of the document collection and then calculate the similarity between the document and the centroid; you might calculate the average from similarities between the document and all documents from the collection; you might calculate the similarity as the minimum/maximum (i.e., the most similar and the least similar) from all similarities between the document and all documents from the collection… As the similarity measure, you might use, e.g., cosine similarity or Euclidean distance.

Zeeshan Anwar

Topic Modeling techniques like Dirichlet smoothing, TF-IDF, the Latent Dirichlet Allocation (LDA) and Segmented Topic Model (STM) can be used.

Radu Nicoara

You can use a Bag-Of-Words approach to extract keywords and then group based on them. I used a similar approach for grouping news articles. If you need more info, you can find it here

http://www.ijcjournal.org/index.php/InternationalJournalOfComputer/article/view/685/398

Mónica Domínguez

There are well-documented specific software tools for this. One of them is Copycatch

http://eprints.sztaki.hu/3093/

John Calvo Martinez

You can also use KNIME https://www.knime.org if you are use to build ETLs

František Dařena