I have a question about if there is a program or some mining platforms can process huge amount of data like text corpus that may contains at least one million instances, as one batch?
Very grateful for your answer. But I encountered this problem when I worked on twitter blogs to mine the sentiment, the number of corpus instances was near to 150,000 and by using the Weka program, also the max heap increased to be 3072 M, however the laptop suspended, end execution and close the weka program.
For extra information can you support me the specifications of recommended hardware to deal huge text data sets whether it is laptop or desktop?
I don't know, but may I ask you about how can I use python programming language to do any mining project if there is a special package in python to do mining work or just standard programming instructions?