I want to preprocess a dataset for them to have information, e.g. the occurrence of each value. Which artificial intelligence technique would you recommend?
I also suggest the normalization of the data (the values of the attributes) specifically if you are using any ML algorithms that are based on information gain
Nihad, your question is valid, you want to check old data to contrast a possible new interpretation. The goal "the occurrence of each value" is similar to find the CDF after you order data in descending or in ascending sense. If you may build a Lorenz curve as a data table, or as a decils table plus the value of the mean, then show it here without explaining units neither research details, so we may reprocess it. The method I use is a non-linear no parametric one for univariate distributions, so it may help for these particular cases. But doing it for the main variable -selected according to your criteria- is always the first step for any multivariable analysis, in my opinion. Thanks, emilio
I don't see any Artificial Intelligence when all you have to do is to find either experimental Probability Density Function (PDF) or Cumulative Density Function (CDF), for a single variable. Or both. Why those 'big words' then?