Depends on the selection of features that you have selected. What type of features are you using? Are they behavior based, based on signature opcodes, mixture of both, or other?
I will give you the good, the bad, and the ugly on the subject from my perspective.
When engaging in the current machine learning trend, the objective is to generalize, so in a sense when you feed the machine learning algorithm the data it is with the intent of generalizing.
When you categorize malware you are adjudicating certain features to a malware and variants of it will fall under the same category. The important thing to do is to have data that represents the core attributes of the category so that the machine learning algorithm can learn its feature and consider those other traits as mere noise in the data. This in a sense covers the objective of finding polymorphed malware.
The bad is that most generic behavior based features must be general enough that can raise the FPR (false positive rate) error. This has hampered the commercial adoption of pure behavior based systems in the past and will most likely not improve with machine learning (though signature behavior hybrids are are in use).
The ugly in my opinion is that machine learning intrusion detection is worst of than signature based systems. Due to increased FPR and the fact that training data will most likely only be able to detect already discovered malware. Most training data sets such as 1999 DARPA can be used to detect already known attacks. More general behavior based data sets will more than likely not just increase the FPR but also FNR ( false negative rate).