What is the effect of the number of features to select a suitable clustering algorithm?

More Vahid Nouri's questions See All

Is synthetic biology an academic research field or an industrial field as well?

There are a lot of published papers and research contents about synthetic biology and different future applications are defined for it. For example, a tree that its leaves are shiny at night. But,...

08 September 2018 7,349 2 View

What is your opinion about the future of bioinformatics?

In my opinion, we are witnessing the growing of using bioinformatics for curing various types of diseases and in the near future it will be extended in all different aspects even nonmedical...

09 October 2017 4,561 4 View

What is the latest bioinformatics' problem?

Hi, I have some questions about bioinformatics: 1) What is the newest problem of bioinformatics? 2) Which algorithms of machine learning are using for them right now? 3) What is the application...

09 October 2017 1,759 5 View

Do genes work under control of brain or they work independently?

Hello, Do genes work under control of brain or they work independently? I mean can the gaining signals of environment through brain affect on the behavior of genes or not? If yes, how? Thank you

11 December 2016 3,575 2 View

Which algorithm is more useful for clustering high- dimensional dataset with low sample numbers?

Hello, There is a dataset with 10 classes that each cluster has about 5 samples and each sample has 100 features. Now, I want to use it for clustering. Which kind of algorithms are useful for...

02 March 2016 3,998 2 View

What is the best way for data anonymizing in a big database?

Hello everybody, Which algorithm of anonymous data is more useful to preserve a big data from a data analizer. For example I want to outsourcing a database for data mining. What is the effective...

03 April 2015 6,729 10 View

What are the newest algorithms to prioritize in machine learning?

Hello everybody, I want to familiar with the newest algorithms to prioritize, classification and clustering. Actually, I want to up to date my knowledge with the new algorithms in these fields.

01 February 2015 2,133 5 View

What are the applications of data mining in the sport?

What are the applications of data mining on the sport fields,same as basketball, football or even on the individual sports? And are there any useful algorithm in these kind of problems? Thanks

01 February 2015 2,218 4 View

What are the applications of data mining in the municipality?

What are the applications of data mining in the municipality? Actually, what are the applications of artificial intelligence in the municipality? Based on polling and data which are gathered from...

31 December 2014 8,238 16 View

What are the applications of data mining in business intelligence?

Actually, data mining has a lot of applications. But, I do not know its applications in business intelligence, organisational management, Business Process Management and etc. Do you have any idea...

05 June 2014 7,036 9 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

Jens Allmer

How are you sure that the clustering algorithm is affected by the number of features?

How did you confirm that?

Did you make sure that the features are not correlated?

Did your rank the features?

Vahid Nouri

Well, I wrote an algorithm based on Type-2 fuzzy. I ran it on 10 standard datasets of UCI. In some datasets which have about 4 features my algorithm lose the accuracy. For example: Iris dataset has 150 samples and 4 features and my algorithm lose the accuracy, in the other hand, Wine dataset has 178 samples and 13 features, but it does not lose the accuracy. Even my algorithm can find higher accuracy agaist General Type-2 FCM (GT2 FCM) for Wine dataset.

Also, I add some noise to these datasets and the results were the same.

Ted Dunning

This doesn't sound like it is based purely on the dimensionality of the input data.

Have you tested with various kinds of synthetic data?

No. Could you explain more about synthetic data,please?

A classic approach to testing clustering algorithms is to generate data from mixture distributions of various kinds. This will allow you to vary different parameters like dimensionality without changing the difficulty of the actual clustering.

For instance, one test for initialization of k-means uses an even mixture of very small symmetric clusters positioned at a few corners of the unit hyper cube. You can adjust the dimensionality of this data very easily and test to see how various algorithms treat it. Since you know that the data is highly separated, you also know that you have isolated the convergence properties from the initialization properties.

Thanks Dr. Dunning. It was very helpful.

S. A. Hojjat

The higher the number of features, the higher the higher the amount of information, and the higher the degrees of freedom, which means the higher flexibility.

BUT, more features does not necessarily provide more useful information! So, the results of your decision making will be worst if you increase irrelevant features as they cause higher uncertainity and noise.

So, the isssue lies on how much you know about the features and how you choose the best combination of features.