I am adopting a dimensionality reduction method for a task of cross-lingual information retrieval and classification. I am more interested in data set with many classes written in big languages.
Look at data sets used in Cross Language Evaluation Forum (CLEF) and the Text Retrieval Conference (TREC). NTCIR had some data for Asian languages if you are also interested.