I have a file containing some lines. Each line contains a Persian sentence, a tab and then an English word which shows each sentence class. I have to extract 1000 most frequent words from the file and then make a matrix. The column of this matrix are the classes of the file (some files have 2 classes, some 3, some more) and the rows of the matrix are these 1000 words (I've enclosed a sample picture). I have to check each 1000 words in each class, if it exist there, add 1 to matrix in the right place (the right column and row for each word and class) and if not, add 0. How can I make this term document matrix? (I need this for LSA (latent semantic analysis), for word sense disambiguation and I have to use python 3. I should give this matrix to SVD).

More Vahideh Torabi's questions See All
Similar questions and discussions