How can we measure the quality of the built clusters?

More Yaakov HaCohen-Kerner's questions See All

Can linear regression coefficients be converted to odd ratios? if so how?

I received in a review a comment asking me to convert linear regression Coefficient into odd ratios? Not sure how to do so? do not understand why either? would appreciate help thanks

23 February 2020 3,485 9 View

Why should a metric of a gravitational field of a static, spherically symmetric object be invariant under time reversal?

In GR, the Schwarzschild metric, describing a gravitational field of a static, spherically symmetric object is invariant under time reversal. Are there physical evidence that this should be so.

18 December 2019 8,218 4 View

Do you know a company that distributes computerized questionnaires in African countries (for a fee of course)?

Hi, Do you know a company that distributes computerized questionnaires in African countries (for a fee of course)? Best wishes, Ya'akov

05 August 2018 9,128 3 View

Which questionnaire would you use to evaluate financial risk preferences (not too long)?

The questionnaire is required for research that deals with the decisions of people who are not healthy and should be short but effective.

07 April 2017 5,945 5 View

Can you recommend on papers that are related to what features are regarded as over fitted for classification tasks?

Can you recommend on papers that are related to what features are regarded as over fitted (e.g., unigrams below a certain threshold) for classification tasks?

01 February 2017 733 0 View

What are the influences of of the proportion between the size of the corpora & the # of features on various ML methods?

What are the influences of of the proportion between the size of the corpora (both positive & negative examples and learning and testing corpus) and the number of features on the...

11 December 2016 5,623 0 View

Examples of burials of amputated limbs (apart from hospital contexts and martial trophies)... Any ideas ?

Dear archaeologists,I am currently working on partial burials for my PhD. work. I am looking for examples of burials of amputated limbs (apart from hospital contexts and martial trophies)... Any...

12 February 2016 5,167 18 View

Fermentation condition of mycobacterium bovis bcg in bioreactor?

Does glycerol inhibit the BCG growth rate?

20 January 2016 5,749 3 View

What are the recommended normalization activities of Social Media Texts such as tweets?

Dear all, (1) What are the recommended normalization activities of Social Media Texts such as tweets? (2) Could you recommend on suitable corpora that can be used for the normalization task? Best...

31 December 2015 6,243 4 View

Classification of tweets - Which corpora do you think should be taken into account?

We're going to to perform some tasks concerning sentiment classification of tweets. Which lexicons, dictionaries, and other types of corpora do you think should be taken into account? Could you...

28 November 2015 9,679 6 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Vipul Dabhi

Inter-cluster and Intra-cluster distance measures may be useful. Moreover, maximum radius and average radius can also be considered.

Gaetano Zazzaro

You can use a classification algorithm where class label is "cluster".

Or purity - Entropy:

http://www.ijcse.com/docs/IJCSE11-02-03-105.pdf

http://www.ijarcce.com/upload/may/Validation%20of%20Document%20Clustering%20based%20on%20Purity%20and%20Entropy%20measures.pdf

Sanjiv K. Bhatia

I believe your idea of manually classifying a sample from your data to create ground truth appears the best. You can then apply the standard measures of precision and recall to verify your clustering.

Brian C. Franczak

Its a pleasant coincidence that this was question was recently posed. The other day my colleague referred me to: Peter J. Rousseeuw (1987). "Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis".

A brief overview of this concept can be found at: http://en.wikipedia.org/wiki/Silhouette_(clustering)

It seems like this method would provide a solution to your problem.

Joachim Pimiskern

Let the users decide. In an information system,

I think, users are interested in the results of

a single cluster as result set of one query.

So present the users a HTML page with the

best hits of a variety of clusters. The users

will with a high probability click only on

the results contained in a single cluster.

As far as I know this is what Google is doing:

instead of returning the page with the highest

rank, then the next lower rank and so on,

Google presents some of the highest ranked

pages of one cluster, then some high ranked

of the next cluster and so on.

It is even possible that while you're

clicking on results, Google might deduce

in the background what cluster you're

interested in and switch to results only

from that cluster on subsequent pages.

Assuming users are only interested in

results of one cluster, you can compare

their click behavior with the members

of the most pertinent generated cluster.

Regards,

Joachim

Malay K. Pakhira

A suitable cluster validity Index may be used.