I have one algorithm in Java code to separate words which is already available in Internet. One such algorithm is Tokenization. But I need an algorithm to remove numerical terms and punctuation marks. Whether Porter Stemmer algorithm is enough to remove all numerical terms and punctuation marks?