I am working on a sales data for one of our customers and performing some exploratory analysis.

We have almost 100 million rows of data. As this is a sales data/ purchase data hence the gender is a very critical variable.

Of these 100 million almost 10 million rows do not have gender information.

Please suggest on techniques to apply on these missing variables as we cannot drop these variables due to high contribution.

Also i read about KNN algorithm can be used for categorical variables. I have not applied this technique before hence request to please advice on this technique or any alternate approach for the situation.

Regards.

Similar questions and discussions