I am trying to perform machine learning (classification) on a public health dataset for diagnosing a disease status (Positive, Negative). In my dataset, all of the features are categorical like educational level (Illiterate, Primary, Secondary, Higher), gender (Male, Female), Wealth index (Poor, Middle, Rich), Marital status (Unmarried, Married, Divorced, Separated, Widowed), etc. Many researchers before performing machine learning, they convert these features into numeric variables. Then, they standardize the features by transforming it into zero mean and unit variance. Since my background in Statistics, it seems quite illogical to me.
I have applied few tree based models (DT, RF, GBM, XGBM) and non-tree based models (SVM, KNN) using categorical variables without standardization, and got performance around 0.75 to 0.85 for accuracy, precision, sensitivity, specificity, and AUC.
Now, I have the following question:
I applied several machine learning classifiers (Decision Trees, Random Forest, GBM, XGBoost, SVM, and KNN) to a public health dataset for disease status classification. All features were categorical (e.g., education level, gender, wealth index). I did not encoded these categories as numerical values (e.g., Illiterate=1, Primary=2, etc.), and also did not standardize these categorical features.
I expect suggestions in this situation from the scientific community who are expert in this field. What should I do in this case??