How do you know if the number of features reduced is sufficient? Is there any rule of thumb for it? Any good idea in this direction is highly appreciated.
Depends on what dimensionality reduction method are you using. If you are using PCA or SVD you can determine how much variance do your reduced number of dimensions contains (95% is standard). If you mean from a feature selection perspective, this is very subjective and it will depend on your feature selection method, types of features, and domain. What I usually do, is run experiments will all features, then either do correlation analysis, info gain, gain ratio, etc. to determine redundant (features) remove and then run experiments again. This will let you know how many of them can you remove while still maintaining a decent results. Hope this helps!
Talking simply, let's image that you select 2 features. While testing, you realise that value of both features increase or decrease in same way. So it means that both of features act in same way, and if you keep one of them and discard other, there will be no effect on results.
To realise that how much dimensionality reduction is enough, have a look at results. If they are good enough, then it means that redundant features are removed and system is working good.
As said above, variance is the best indication for dimensionality reduction of features.
Could you elaborate on your question: "...... is sufficient?" for what purpose? What do you intend doing with the reduced feature set? Whether a given reduced set is sufficient or not would depend on what you do with the reduced set and how successful that attempt is. So, intended purpose may help provide a more focused response.
I agree with Ahmed that Rough Set Theory actually deals with the sufficiency of selected features. The concept is known as reduct.
It give something more than a rule of thumb. RST actually provides a mathematical basis that a certain subset of features is sufficient for the task at hand (usually classification).
However, one prerequisite for RST reduct finding algorithms to work (it's not a free lunch) is that you need to discretize the features if they are continuous (real-valued). Though, that is a necessity in every classification (or even unsupervised learning) algorithm.