I am working on clustering students according to their marks using K-meansAnyone can suggest how to semantically label/name the resulting clusters?

More Doulkifli Boukraâ's questions See All

Is there a free higher educational dataset for use in a data mining project?

We want to discover association rules among the student score data in order for instance to uncover the link between scores within a same teaching unit of bachelor studies or master's studies, or...

23 March 2015 7,704 4 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Why can't academics earn the money they deserve?

Only Journals make money from the articles we have worked on for years. Academics do not earn money from their refereeing. Then shouldn't the solution be a system in which academics can earn...

01 August 2024 6,469 6 View

Conjugation of PEG-Amine to an Amino Acid Using EDC?

I am attempting to conjugate PEG to an amino acid at the C-terminus, for the purposes of producing nanoparticles. I have been told that PEG modified with amine groups can be used for this purpose,...

31 July 2024 2,033 1 View

How Do Project Data Analytics and AI Advance Quality 4.0 in Construction Project Management?

As the construction industry advances, the integration of Project Data Analytics and Artificial Intelligence (AI) is becoming increasingly crucial in project management. These technologies are...

31 July 2024 6,484 1 View

Robert Moulder

What program are you using? kmeans in R is a pretty decent package for this. When it comes to naming clusters that is usually a choice made by knowing something about the data units in each cluster.

Imad Rida

I think that you have to use hierarchical clustering, that what I heard, but honestly I have never used it

ASIM ULLAH JAN

Specifying number of clusters is the job of Area specialist, there may be some methods for automating K but the results will be effecting. For labeling you can use Naive Bayes and Decision Tree.

Zafar Ali

As Asim ullah suggested, you can use Naive bayes and decision tree for labeling your classes, however, number of k depend upon the data you are going to classify. For selecting number of k, you should adopt some random approach as i have heard that its selection process is performed randomly.

Dear Zafar, random approach is not doing well, only the domain expert of that field(data) can suggest number of clusters accurately. i.e. if you want to cluster students data on the basis of marks, so you know how much clusters will be required may be you want to divided it into A,B,C etc grades or some other criteria.

ASIM ULLAH JAN,

You are right brother but, in different publications i have noticed that k is selected randomly. When number of classes is not known in advance then k should be selected randomly and can be changed later on, however, when classes are known in advance then Random forest or SVM is appropriate approach to adopt.

Regards

Thanks Zafar

Sisay Chala

Expert heuristic is the best means to determine the number of clusters. There is also a 'Rule of Thumb' that you can use. It states that K is equal to the square root of n/2.

The following articles may be useful:

http://www.ijarcsms.com/docs/paper/volume1/issue6/V1I6-0015.pdf

http://papers.nips.cc/paper/2526-learning-the-k-in-k-means.pdf

http://www.ee.columbia.edu/~dpwe/papers/PhamDN05-kmeans.pdf

Doulkifli Boukraâ

Thank you for your helpful answers.