I am currently working on some real-life market data. The dataset consists of around 1200 samples with 20 attributes describing each sample. I am facing two problems:
1) The dataset is a text dataset. I am facing a lot of problems in parsing the data. Could anyone recommend some methdology to handle the data.
2) Standardization of dataset: The attributes are not standardized e.g. In Colur, there are values such as "Blue, Black" for one sample and "Black, Blue" for another. How do I go ahead with this?