What are the best methods for discretization of continuous features?

More Sajjad Fouladvand's questions See All

I have an accepted paper in a conference. They will published papers in a volume by IEEE Press as post-proceedings. What exactly is a post-procedeeng?

I have an accepted paper in an international conference. They are going to published all accepted papers in a volume by IEEE Press as post-proceedings. What exactly is an IEEE...

04 May 2015 9,335 1 View

Any advice on detection and molecular classification of birds using molecular detector DNA Barcoding ?

I'm intrigued if there exist a way to do this project "Detecting and molecular classification of lightweight birds using a molecular detector DNA Barcoding" using pattern recognition and machine...

02 March 2015 8,801 3 View

What is the bests ways for getting a high score in GRE exam?

I've read 504 words, essential words for toefl and1100 words. I'm going to take the GRE exam in 7 or 8 month (I just wanna apply for fall 2016 and so I guess I'd better take the exam in 7 or 8...

11 December 2014 3,617 2 View

How can we use Cross Validation methods for both parameter optimization and error evaluation simultaneously?

I want to build a machine learning model and unfortunately I have a limit and small number of samples for both training and testing phase. I've always used a validation set for parameter...

11 December 2014 8,220 10 View

Does anybody know an ISI journal with quick process in the area of natural computing, artificial immune systems, biologically inspired algorithm?

I'll appreciate if somebody introduces me a good ISI journal with quick process time. My paper is in the field of artificial immune system, biologically inspired algorithm, natural computing,...

10 November 2014 10,002 2 View

I'm creating a data set for machine learning tasks. How should I decide on the number of samples in the data set?

I'm gathering data and trying to generate a data set. I'm wondering how should I decide on the number of samples. How many samples should be measured. Besides, any other recommendation about...

10 November 2014 6,540 18 View

How should we deal with the lack of training data in a machine learning task?

I want to work on a machine learning and pattern recognition task, but the size of data set is small and there are only 43 samples for both training and testing purposes. How about the bagging...

09 October 2014 5,057 15 View

Which are the most efficient feature selection methods for one-class classification problems?

I want to rank features in a one-class classification problem. I'm looking for proper feature selection methods for one-class classification problems. Thanks in advance.

09 October 2014 4,166 17 View

What are the most interesting application of image processing, machine learning and pattern recognition in medicine these days?

Image Processing and machine learning concepts has been widely used in medicine. I'm going to work on the use of Artificial Intelligence techniques (preferably image processing and machine...

08 September 2014 835 5 View

Do you know any ISI journal related to pattern recognition, machine learning, signal processing and natural computing ?

I'll appreciate if you inform me about journals with a short reviewing process at most 7 or 8 month. Thanks in advanced

05 June 2014 5,192 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

Ajay Kumar Popular answer

use WEKA its simple and fast. you can use either explorer or Knowledge flow for discretization. yoy can set different parameters according to your need.

Oliver Sampson

KNIME implements the CAIM binner.

http://citeseer.ist.psu.edu/kurgan04caim.html

http://www.knime.org

There is also the LUCS-KDD software for association rule mining

https://cgi.csc.liv.ac.uk/~frans/KDD/Software/

Ahmad Hassanat

I addition to what Oliver listed:

Method 1- very simple method: determine the number of bins (B), divide the range of the data into B to do equal intervals, give value 1 for all numbers located in interval 1, 2 for the second interval and so on.

Method 2- if you know the number of categories: do k-mean clustering on the continuous data to cluster the data into k clusters.

Method 3- if you do not know the number of categories: do Hierarchical clustering on the continuous data to cluster the data into k clusters.

Dr. Indrajit Mandal

hello friend

There are some methods like

Minimum Splits Based Discretization for Continuous Features

discretizing continuous data include Fayyad & Irani's MDL method, which uses mutual information to recursively define the best bins, CAIM, CACC, Ameva, and many others

Selection of the methods depend on the problem at hand.

All the best

Ajay Kumar

Fabrice Clerot

see

M. Boullé. Khiops: a Statistical Discretization Method of Continuous Attributes. Machine Learning, 55(1):53-69, 2004

http://perso.rd.francetelecom.fr/boulle/publications/BoulleML04.pdf

the technique is implemented in the Khiops tool

http://www.khiops.com

Amaury Lendasse

Could you tell why you want to discretize features? That's a very interesting topic!

Sajjad Fouladvand

I appreciate all comments.

Hello Dear Lendasse,

Many thanks for your comment.

Actually I want to use different machine learning methods and compare them with each other. Some of them like decision tree work with discrete features. So I decided to discretize the features.

Viswanath Pulabaigari

How do you measure quality of discretization? Discretization is going to lose some information that is present in continuous values, so, a method where this loss is minimum is a good one. You can apply clustering methods like k-means where sum of squared deviations (loss) is minimized. You can define loss and find the discretization that is going to minimize this loss.

Jugurta Montalvao

Dear S. Fouladvand

I fully agree with M. Landasse, in that it is always preferrable to ask first what is the actual source of data. But yet there is a pretty robust method for discretization/symbolization that may work in a wide variety of cases: discretization through ordenation. See, for instance, the paper 'Permutation-information-theory approach to unveil delay dynamics from time-series analysis', PHYSICAL REVIEW E, n. 82, 2010.

In Matlab code, for a single 1D signal, it may be as simple as:

s = s(:)';

L = length(s);

n = 5; %I arbitrarily set n=5 to ordene 5 consecutive samples -> segments of s

% are therefore mapped into 5! = 120 symbols

base = n.^(0:n-1)';

for i = 1:L-n+1

[~,ord] = sort(s(j+i:j+i+n-1));

y(i) = (ord-1)*base;

end

To discretize multivaried data, say N simultaneous signals (N channels), a simple adaptation of this method is its application to each channel, thus producing a stream of symbols per channel that can be re-combined to produce a single symbolic stream.

I hope it may help you.

You can use WEKA Tool for data discretization

interesting!

Francisco Bischoff

to discretize continous data, just do this:

max_data = max(data)

discr_data = floor(ceiling(data * n_bins) / max_data)

now your data is integers from 0 to n_bins