How to choose the best clustering algorithm for high-dimensional, power-law and non-normal data?

More Dinh Truong Vu's questions See All

How to fill large gaps in GIS?

Dear colleague, I need help with a GIS issue. The polygons have been separated by roads, leaving space between them (as shown in the attached figure). I'm looking for an automated solution in...

04 July 2024 6,752 3 View

Plant extract sterilization???

Dear researchers, I've been working on aqueous plant extract sample, then I autoclaved them in 10 min to sterilize and added some preservatives. However, some of them showed bacterial and fungal...

11 June 2024 1,040 2 View

Why is Trichoderma harzianum producing yellow pigment on rice broken at fermentation?

i have a species trichoderma harzianum. It is development in PDA media normally. But it is producing yellow pigment on rice broken at fermentation.

31 May 2024 9,260 0 View

What can be mininarratives of a war scenario in political discourse about election?

Dear researchers, I am working on the conceptual metaphor POLITICS IS WAR and would be grateful if you could suggest some mininarratives of a war scenario in political discourse. Best regards,

27 May 2024 3,615 2 View

MtPhyl tools for human mtDNA phylotree construction?

Hi everyone, I am currently working on human mtDNA and try to construct phylotree based on my samples. I noticed that a tool called mtPhyl (https://sites.google.com/site/mtphyl/) was used in many...

31 March 2024 3,630 3 View

Can I replace blotting paper by other papers?

Hello everybody, I'm currently performing Western blot and I am a beginners for Western blot, so I want to ask that Can I replace blotting paper by any other papers?

14 March 2024 1,877 3 View

Is my recipe of loading buffer for SDS-PAGE correct?

Hi Everybody. I'm senior in a University in Vietnam. This is fisrt time I do SDS-PAGE, so I want to ask about the correction of my loading buffer recipe for SDS-PAGE as a following file. In...

26 February 2024 7,565 3 View

How can I merge two profiles into one?

Dear all, I created a profile a long time ago with 2 publications included in that profile. Now I have a new profile and would like to merge two profiles into one, or at least I can include those...

17 February 2024 5,904 1 View

What role does Vietnam play in the US's Indo-Pacific strategy during the Biden era?

Compared to America's traditional partners such as Korea and Japan or emerging countries such as Indonesia and Malaysia, does Vietnam have any advantages or disadvantages? Compared to Trump or...

23 December 2023 8,839 0 View

What are the current hot topics for research in terms of psycholinguistics?

Hi, I have been out of the research area for around a year, but now I'm gaining my interest in this field once again. I'm wondering what is the hot topics, hot research areas that I should dive...

04 December 2023 1,947 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

Mohamad M. Awad

Dear Dr. Vu,

you have to check whether your data are correlated or not to remove redundant data that can affect the outcome of any clustering algorithm. In addition, normalizing the data based on common criteria can help in speeding up the process and in avoiding local optima.

There are many clustering algorithms that are better than K-means such as FCM or SOM.

Send me a sample to check your data

Vincent F Adegoke

You can try K-means Clustering Algorithm, DBSCAN, Subspace clustering algorithms or Spectral Clustering among other.

This article might also be helpful - https://arxiv.org/ftp/arxiv/papers/1501/1501.02431.pdf and this article https://machinelearningmastery.com/clustering-algorithms-with-python/

Good luck.

Giuseppe D'Alessio

I think that you should definitely preprocess your data before applying any kind of clustering algorithm, in this case. And yes, I think it is convenient for you to preprocess such that the variables of your dataset have mean equal to 0 and standard deviation equal to 1.

With regard to the question about the best clustering algorithm: you did not mention the size of your data, but if the number of observations is not too high I recommend the use of the spectral clustering.

Dear Prof. Dinh Truong Vu,

I investigated your data and I tried the following three steps which works in clustering the data:

1- Normalize the variables with large values base on the maximum value in each column (variable)

2- compute the correlation between these columns and reduce the number of variables (column) according to the correlation value(can be > 0.5)

3- Use Self-Organizing Maps (SOM) with different net sizes and compare the results.

Good luck

Eugene Veniaminovich Lutsenko

You can try true clustering