Assume that a dataset has a mix of categorical and numerical attributes. The dataset has to undergo numeric processing which necessitates the conversion of the categorical attributes to numeric/quantified form.

But if we do this - irrespective of the strategy we employ [like dummy variables/probability weights etc.] - do we always stand to lose information? How does one measure this information loss?

Any references or resource links would be most helpful!

Thanks in advance

Similar questions and discussions