Data is multivariate and multiclass. Furthermore, data comes from different source types (discrete and continuous). Currently I am looking at each feature and replace the non-measured value with the "most probable one" after estimating the distribution via kernel distribution estimation. Does anybody have a tip or maybe paper I could read?