In open source software development projects , a lot of data generated during each phase of software development . specially on bug tracking system where different users reported bugs, and diffferent issues. These data containts a lot of noise , uncertainity and trustworthiness issues. Now question is how to handle these challenges and if these are not handeled at a right time time then the perfromance of classifiers, or models can be degraded.
The truthfullness of data is a measure concern in open source software evolution. And unecrtainity is a infection avaalable in data then how to handle it? Can we take a subset data and checket if uncertinity as a factor exist or and the treatment , we are giving is working correctly......
We are not talking about the volume and handling those large data by using cloud/hadoop /mapreduce/radoop/creating multple nodes /division of data....