04 April 2014 1 7K Report

I am using machine learning techniques to train models for different training candidates which are actually combination of original datasets. I am creating training candidates by adding no more than 3 datasets at a time. I am using text mining in rapid miner tool to extract terms from bug summary and using them to train model. My problem is that when I am adding dataset in training set number of common terms between training and testing datasets are decreasing in comparison of number of common terms for single training dataset and f measure performance is either increasing or decreasing independent of number of terms. What is the reason behind this? Why common terms are decreasing when we are adding datasets and f measure trend is not constant?

Similar questions and discussions