What would be a good measure to use with centroid cluster classification?

More Miguel Hinojosa Fernández's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Marcus Vinicius dos Santos Araujo

O.k.

I didn't understand very well what you want exactly. Data Mining depends on three steps: Data Representation, Dissimilarity Measure and Clustering/Classification algorithm. Which one of these steps are you looking for? You mentioned classification and clustering, but then you asked about a dissimilarity measure (Euclidean).

Data Representation: What kind of data are you working with? It will restrict the following steps.

Distance Measure: Have you tried other Minkowski distances besides Euclidean? I've seen that you work with Meteorology. Then your data may involve Time Series, in this case, try Dynamic Time Warping (DTW).

Clustering/Classification: What methods have you already applied? Do you own a Training Set? Initially, I would try Kmeans (Clustering) and KNN (Classification).

Quality Measures: How are you calculating the quality of your results? Accuracy may be fallacious. Try Precision, Recall, Fmeasure and Dunn Index if you own a Ground Truth. If you don't, try SIlhouette.

I hope it helps. =)

Nasser Towghi

Trying to interpret you question, (1) there are two questions here (a) Forming the clusters, and then defining/ their centro id., and then (2) subsequently classify an observation based on some "distance" metric from these centroid. This is done either with training set data, or none trained data.

Consider the case when you have the training set data. (The case withot the traning set is similar). That is you know thee classification of the data. Lets go to (1) First. have form your "clustered" . That is each "cluster" is a set f data point presenting a same class and each of data points are represented by a vector in Euclidean space. If dimension of the data sets relatively small compare to compared of data in cluster then you finding a closed form of distribution based on empirical distribution and interpolation would be the ideal situation. But in general this may not be feasible and the closed from may be too complicated to work with. Nonetheless it is imperative to check that whether or not the empirical density function has multiple local maximas. Roughly speaking number of maximal points indicate that the cluster (class) is best represented as number of sub clusters. At his point you can use the location of these local maximas to model the density as Gaussian mixture with local maximal locations as the mean of each of the mixture. These (mean vectors are your centroids for the cluster in question, so you may have multiple centroid indicating your classes are in fact further subdivided into sub-classes. Each of these sub-classes are identified by three parameters: mean vector , covariance matrix, and the weight of the corresponding component of mixture (sub centroid) . Now you ca classify a data point by measuring its Mahlanobis distance (It is sort of normalized Euclidean distance, normalized by Covariance matrix).

ff your data points are N dimensional vectors and number of classes that you need to classify is M where M

Emmanuel Tuyishimire

6062

Hi there,

What problem are you working on and why is the Euclidean distance not an efficient metric? What kind of assumed data set? Let us know a litle bit about the structure of your data set (metric space). The choice of a clustering metric (distance function=utility function) depends on the underlying problem and the structure of assumed data set and these are not clear here.

Miguel Hinojosa Fernández

Something like Distance time warping measure is exactly what I was looking for. I started with k-means using Euclidean Distance but the real problem i have involves time series so thanks for your recomendation on this regard.

Bryar Hassan

Well, there some methods in clustering algorithm to measure distance calculation for numerical attributes (the centriod and the observations in the same cluster):

- Minkowski Distance;

- Manhattan Distance;

- Euclidean Distance;

- Chebyshev Distance;

- Cosine Distance; and

- Jaccard Distance.

Each of them has pros and cons. You may try one of them them based on your situation.