How to decide the best classifier based on the data-set provided?

More Ranjan Piyush's questions See All

Do anyone have the R-script for analyzing POOLED data of multiple years or locations under SPLIT-PLOT design?

As we know pooled analysis is done to find out the significant different among treatment across the years/ locations. But, I am unable to analyze it for split plot design for analyzing my data. Do...

21 March 2024 1,491 2 View

While developing non equi atomic high entropy alloy, how we select percentage of metal powders?

non equi atomic high entropy alloy

05 March 2024 8,777 1 View

How to dock ligand containing 2 two coordination bonds with metal ion?

I synthesis new ligand, that ligand react with divalent metals it forms metal complex. l used maestro 9.0 for docking the protein with ligand but the problem was maestro 9.0 (not only that all the...

21 February 2024 2,485 1 View

What are the network security related complexities are there in Mulsemedia (Metaverse) Communications ?

Network Security in Mulsemedia (Metaverse) Communications

06 January 2024 1,761 3 View

I need MD simulation in 500 ns for some of my test compounds with PBSA, GBSA and PCA. If a researcher will help, we will provide authorship?

For my research, I need MD simulation of 500 ns with PCA. Please reply if you have facility. We will provide corresponding authorship. The article will be published in a high IF journal. My email-...

26 November 2023 6,926 0 View

Where I can do extrusion of Magnesium alloy in india?

Any vendor or Institute.

05 November 2023 3,030 0 View

How to analyze the data received from scratch test of a coated SS316L ?

Scratch Test

18 October 2023 4,063 6 View

What are the parameters which affects the liveability of a rural settlement?

Wish to know the relevant variables or parameters that are important to be considered to assess liveability in any rural settlement/ non-urban settlement.

05 October 2023 6,085 1 View

Anyone can help in Grey TOPSIS optimization technique?

Any solved example will be very helpful

30 September 2023 5,873 2 View

How we define Attributes in Entropy-VIKOR Technique?

Entropy-VIKOR method is a MCDM technique

29 September 2023 8,233 2 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

Daoqiang Zhang

perform cross-validation (e.g., 10-fold CV, or LOO)

Negar Ahmadi

Your question is so general, indeed it depends on the problem and your data set and also the presentation, you should explain more so others will be able to help you in the right way, good luck

Ranjan Piyush

Dear Zhang,

The cross validation comes after the classifier is selected. I meant to ask about some initial tests that could be performed on data itself to guide me in selecting a particular type of classifier.

Dear Negar,

I wanted to know it very generally whether there are certain set of rules to help me decide the classifier from any given dataset.

Tiago A. Almeida

As far as I know there is no a well defined rule for such task. In general, it depends on the kind of data and amount of samples x features. For instance, I would recommend to use naive Bayes or linear SVM for text classification/categorization. For datasets with numerical attributes: I would suggest linear SVM, neural networks or logistic regression if the amount of features is much greater than the number of samples. On the other hand, I would recommend neural networks or SVM with RBF or polynomial kernel if the amount of samples is not too large and greater than the number of features. Otherwise, if the number of samples is huge I would suggest to use neural networks or linear SVM, and so on. Obviously, there are other options for each scenario than those I have mentioned.

Dear Alejandro,

I have read about the evaluation techniques that you have mentioned. I would like to know details about meta learning, or some other methods where the data itself should pick the classifier with least effort on the user side. It would be kind if you suggest some papers for the same.

Dear Tiago,

Thanks for such an explanatory answer with examples. It would be appreciated if you could suggest some papers that explain the selection of classifier based on data-sets (some sort of review paper).

Dear Veronika,

Thanks for pointing out the dimensionality of data that often restricts the visualization process. Is it a good practice to visualize higher dimension data by dividing them into lower dimensions.

I have attached a sample plot for two class problem depicting the distribution of four different features. What inferences can be drawn about the choice of classifier from the scatter plots?

Thanks for the tool. I will explore it using my data and get back to you with any issues.

Ingo Siegert

There are possible two stategies to perform: a) knowing a bit of the underlying production process of your data, if your data also have a dynamical component, that means the characteristics of the data presented is also dependend of time (e.g. speech I can say something "veeery" slow or "very" fast - it results in the same characteristics but my classifier should be aware of these dynamic time warping. Than I should use a dynamic classifier as Hidden Markov Models. If this is not the case than I can try to use static classifiers as Neural Networks, or SVMs.

But as far as i know, there is no statistical test around, to decide this question. One method, that is addressed often in this discussion is, to compare different classifiers with the same features and than use that one with best performance. Bit this requires the knowledge of optimal parameter settings for all investigated classifiers, which one wold normally do after a suitable classifier is selected.

For statical classification tasks, you can also use the tool WEKA it is a datamining tool, but also includes tools for data pre-processing, classification, regression, clustering, association rules, and visualization (http://www.cs.waikato.ac.nz/ml/weka/)

Luís Torgo