How can I get the number of clusters (K) before running k-mean algorithms?

More Moustafa Elhamshary's questions See All

Dose anyone have haar or HOG cascade classifier xml file for car Detection?

I already built one using UIUC car dataset but it contains a little number of positive and negative samples (500 for each)?I need a trained classifier based on Large dataset.

05 June 2016 5,993 4 View

How many negative and positive samples a Haar or HOG Cascade needs for detecting cars?

Many researchers suggested that i need at least 7000 positive samples and more than that for negative samples. Any recommendation?

05 June 2016 7,958 3 View

How can I combine two different dissimilarity functions using the same agglomerative hierarchical clustering technique?

I have data when clustered using Euclidian distance alone give bad accuracy in certain situation. However when clustered based on other dissimilarity measure provides bad accuracy in different...

02 March 2014 9,521 0 View

How can I use Weka for clustering and classification?

Can anyone suggest me a simple and fast tutorial?

02 March 2014 1,603 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

Ryan P Cabeen

it's common to run k-mean with multiple values of K and choose the "best" via various rules of thumb:

https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

Dirichlet processes are a more principled alternative--reading up on Bayesian non-parametric statistics will help with understanding them:

http://stat.columbia.edu/~porbanz/talks/npb-tutorial.html

Sharjeel Imtiaz

There is heuristic approach by dividing all data set into equal number points. suppose you have 3 classes k divide by n objects/k where k=3 so equal number of objects will be distributed into classes or clusters k=3. This approach alternative is rules of thumb which is similar k=(n/2)^1/2.

M. Ramakrishna Murty

For K-means algorithm is unsupervised (Clustering) . The number of clusters are guess and randomly choose. But follow based on following conditions. If you have 'n' objects number of clusters maximum 'n' when each object belongs to independent cluster. if number of clusters minimum 1 if all objects are belongs to single cluster.

apart from this between 1 and N you may choose any value randomly.

Nicolaie Popescu-Bodorin

Hello there,

When k-means is used for data quantization, the number of classes is established by satisfying a condition on quantization error amplitude. This works well especially for uniformly distributed random samples. If this is not the case (i.e. the data have an observable natural grouping tendency), the histogram shows some agglomeration points (at different 'scales') as points of local maximum. Counting these points in a fuzzy manner (by neglecting the local maximum points that are not "so intense" - until optimization goal is best fitted) is a good practical manner to answer your question.

There are also 'classical' rules such as thumb rule, elbow rule, average silhouette method, Akaike / Bayesian / Deviance information criteria, jump method and cross-validation (when you have a goal function to optimize and big enough data set that can be split in smaller sub-parts such that to obtain an agreement on the number of clusters that best fits the optimization goal across all sub-parts). However, their performance depends mainly on the nature of data.

Kind regards,

http://lmrec.org/bodorin/