Sorry for late answer and I appreciate your beneficial comment. I'm suppose to do a web page classification and I don't have access to any data set. I think a plain text could be enough. I'll take a look at the link and I'll ask about it.
You can do it in the opposite way. I mean, Web directories, such as those provided by Yahoo! and the dmoz Open Directory Project provide an efficient way to browse for information within a predefined set of categories. You could navigate trough these web directories, store the links in these pages, and try to recover its corresponding category from its contents.