Text Clustering based on vector space model is another technique for Spam detection. It is method that computes disjoint clusters automatically using a spherical k-means algorithm for all spam/non-spam mails and obtains centroid vectors of the clusters for extracting the cluster description. For each centroid vectors, the label (`spam' or `non-spam') is assigned by calculating the number of spam email in the cluster. When new mail arrives, the cosine similarity between the new mail vector and centroid vector is calculated. Finally, the label of the most relevant cluster is assigned to the new mail. By using this method, one can extract many kinds of topics in spam/non-spam email and detect the spam email efficiently.
Most of the email spam techniques would work , however where we are trying to find the spams is also important. If we are looking at social media Twitter would be a different ball game than an Amazon review. In our recent studies we also saw people are focussing on finding spammers rather than spam in case of Tweets.