I am actually working on Arabic morphological disambiguation. It is an unsupervised classification task and we find both in the training and the testing sets imprecise attributes. That is, a given attribute may have many possible values generated by a morphological analyzer. These sets do not contain the words. We have only the values of the morphological features (POS, gender, number, etc.) of the two preceding and the two following words and we want to predict the features of the current word.
Can SVM tools be used for training from imprecise data?
How to run it with a data set which does not contain the words (only the features are provided)?
How to specify ambiguous attributes (having many possible values) in the test data set?