use WEKA its simple and fast. you can use either explorer or Knowledge flow for discretization. yoy can set different parameters according to your need.
Method 1- very simple method: determine the number of bins (B), divide the range of the data into B to do equal intervals, give value 1 for all numbers located in interval 1, 2 for the second interval and so on.
Method 2- if you know the number of categories: do k-mean clustering on the continuous data to cluster the data into k clusters.
Method 3- if you do not know the number of categories: do Hierarchical clustering on the continuous data to cluster the data into k clusters.
Minimum Splits Based Discretization for Continuous Features
discretizing continuous data include Fayyad & Irani's MDL method, which uses mutual information to recursively define the best bins, CAIM, CACC, Ameva, and many others
Selection of the methods depend on the problem at hand.
use WEKA its simple and fast. you can use either explorer or Knowledge flow for discretization. yoy can set different parameters according to your need.
Actually I want to use different machine learning methods and compare them with each other. Some of them like decision tree work with discrete features. So I decided to discretize the features.
How do you measure quality of discretization? Discretization is going to lose some information that is present in continuous values, so, a method where this loss is minimum is a good one. You can apply clustering methods like k-means where sum of squared deviations (loss) is minimized. You can define loss and find the discretization that is going to minimize this loss.
I fully agree with M. Landasse, in that it is always preferrable to ask first what is the actual source of data. But yet there is a pretty robust method for discretization/symbolization that may work in a wide variety of cases: discretization through ordenation. See, for instance, the paper 'Permutation-information-theory approach to unveil delay dynamics from time-series analysis', PHYSICAL REVIEW E, n. 82, 2010.
In Matlab code, for a single 1D signal, it may be as simple as:
s = s(:)';
L = length(s);
n = 5; %I arbitrarily set n=5 to ordene 5 consecutive samples -> segments of s
% are therefore mapped into 5! = 120 symbols
base = n.^(0:n-1)';
for i = 1:L-n+1
[~,ord] = sort(s(j+i:j+i+n-1));
y(i) = (ord-1)*base;
end
To discretize multivaried data, say N simultaneous signals (N channels), a simple adaptation of this method is its application to each channel, thus producing a stream of symbols per channel that can be re-combined to produce a single symbolic stream.