This is a typical information retrieval problem similar to what search engines like Google are dealing with. Your keywords are the queries and your websites are the documents. Term Frequency-Inverse Document Frequency (TF-IDF) is the technique that is usually used in this situation. If I want to explain what TF-IDF does in a simple way I would say it ranks the documents based on how many keywords are in each website but also taking into account the commonness of the keywords. For example, keyword "the" is most likely used in many websites while keyword "U.S. elections" is probably less common. So again, the answer to your question is TF-IDF and you could read more about it here:
You want to build a rec-sys based on content-based filtering. In these methods TF-IDF technique is usually used. However, other techniques such as supervised learning (e.g. ANN, ...) or unsupervised learning (e.g. Clustering) can be used.
For more information, please see feature-aware methods.
Learning to rank algorithms for identifying a ranked list of related websites to recommend to a user after he has access a current website. Ranking SVM, AdaRank, LambdaRank, RankNet are some of the best options.
This is a general question , u should be more specific. If u want to build a recommender system u first determine which approach u will use , content based , collaborative filtering or hybrid. second , u should determine the dataset (domain) that u will work on to know the data and how to deal with it , is it numerical rating or not . Finally , u should determine what are the factors for recommend items to user.
You also can deal with it as a classification problem and choose the top n recommender websites based on the highest similarity.
Sumaia M. Al-Ghuribi Let me split the question and be specific. Actually, I have collected a text data from 500 websites of different categories(e.g. news, online shopping, travel agencies and so on.)Now the problem is this text corpus is multilingual and I want to cluster them based on similarity. What is the best way to do it?