What would be the most suitable machine learning approach to classify a web site? Yes, we can apply text mining concepts on web content but the problem is that the content is not only in a single language, it also has some images and video content too. The other thing is that web content is more semi-structured, unlike the simple text content of any document.