You should learn how to use regular expression to filter punctuation marks and things like that. This is the easiest way to do that; however, please make sure that you won't need them later for splitting text into sentences for example, or parsing texts, because they should be important then.