What method should be used for classification of big data with an unknown number of clusters?

More Ayoub Bouziane's questions See All

What is the best method to select the relevance factor in GMM-UBM based speaker recognition systems ?

31 December 2016 847 1 View

Do you know a website similar to VoxForge ?

Good evening ladies and gentlemen, Do you know a website similar to VoxForge?

02 March 2016 8,411 0 View

Any advice on the deterministic annealing EM algorithm ?

I am working on a machine learning application using GMM with the deterministic annealing EM algorithm ... when i implement it, it gives me bad results compared to the classical EM algorithm .. ...

11 December 2015 5,794 3 View

What is the state-of-art approach of automatic language identification based on speech signals ?

what is the state-of-art approach of automatic language identification based on speech signals ?

11 December 2015 5,042 0 View

It is right to compare between two journals on the basis of impact factors putted on their websites ?

Good evening everybody,pls .. I want to know if the impact factors of all journals are calculated in the same way .. that is to say .. it is right to compare between two journals on the basis of...

04 May 2015 4,249 2 View

How can we calculate critical value based on confidence level for the anderson darling test?

02 March 2014 888 1 View

What is the importance of using chunks for calculating GMM in the voicebox toolbox in Matlab?

I found in the comments that they used them to partitionate big data into several chunks, but why does Matlab not do this partitioning automatically? Is it an obligation to use them?

02 March 2014 2,015 0 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

Ayoub Bouziane

Thanks Dear Majid for your response ... mixture models necessitate the specification of the number of components ... so, how can i determinate the number of components that best represent the data distribution ?? is there any matlab function or external toolbox for this purpose ??

Kevin R Keane

The approach of AutoClass, which automatically finds the natural classes, is pretty cool. You might want to look at http://ti.arc.nasa.gov/tech/rse/synthesis-projects-applications/autoclass/ and the research papers based upon AutoClass.

M. Ramakrishna Murty

Hierarchical clustering methods are most probably good for unknown number of clusters

Good Evening dear colleagues ... thank you all for your interesting responses ... a freind seggested to me the PG-means and the XPG-means methods .. do you have any idea about them ? ... is there any available implementation of them on internet ?

Mahmoud K Okasha

You can do clustering and use the Mean Split Silhouette (MSS) as a measure of cluster heterogeneity and. You can also use it to estimate the number of significant clusters by choosing the number of clusters as the one that minimize the MSS and produces the most homogeneous clusters in the data.

Nicolaie Popescu-Bodorin

Hello there,

Regardless the generic learning method adopted for the given classification task, BigData (where by BigData I understand a scale comparable with data from social networks like facebook, twitter, linkedin, data from web blogs, comments and personal documents, data from public image repositories like instagram, flickr, picasa etc. and from movie repositories like youtube etc., data from internet searches, or from large prime numbers searches etc.) requires some specific adaptations such as negotiating a good balance between online learning, partial learning, parallel&distributed learning. The results of this negotiation should be compatible, of course, with the manner in which you choose to express and test the levels of intra-class similarity and inter-class non-similarity, which , on the other hand are very much data-specific. These are the critical aspects when designing classification algorithms for BigData. Ready to run algorithms for a specific problem - I'm not so optimistic. A nice inventory of BigData techniques is here: http://www.mapr.com/blog/big-data-zz-%E2%80%93-glossary-my-favorite-data-science-things#.UzAwkaiSwsA

Kind regards,

Nicolaie Popescu-Bodorin,

http://lmrec.org/bodorin/

Jan Hajič, jr.

If you are looking for some well-founded probabilistic math, check out non-parametric Bayesian methods. Distributions such as the Dirichlet Process or Pitman-Yor process allocate some probability to unseen classes.

http://stat.columbia.edu/~porbanz/npb-tutorial.html

http://stat.columbia.edu/~porbanz/papers/porbanz_OrbanzTeh_2010_1.pdf

Erick Alfons Lisangan

If you want to know how much the number of cluster from your data, you can cluster it using some clustering algorithms. But, some clustering algorithms, like k-means, need the number of cluster (k) as a parameter and you must analysis the best "k" by calculate internal validity measurement, like SSE.

If you want to find the number of cluster by defining the large number of cluster, you can read this paper https://www.researchgate.net/publication/221908653_Learning_the_Number_of_Clusters_in_Self_Organizing_Map?ev=prf_pub

Chapter Learning the Number of Clusters in Self Organizing Map

James Tseng

http://stackoverflow.com/questions/15376075/cluster-analysis-in-r-determine-the-optimal-number-of-clusters/15376462#15376462