I have a multi-class problem. I want to solve this problem using the LCPN (Local Classifier per Parent Node) method of heirachical classification. Therefore, I have manually divided the original dataset into a hierarchy of 2 levels. Level 1 has, four parent classes & each of the parent classes have child classes (level 2). Does anyone know if the below mentioned approach is correct to solve this problem?

  • divide the entire dataset into train set(70%) and test set(30%)
  • oversample the training set (since it’s imbalanced)
  • train the parent classifier using the resampled training set & test it on the entire test set
  • for each of the 4 sub classifiers under the parent classifier:

    4. take the resampled training set (the dataset obtained after resampling, at step 2. )

    5. filter the instances that belong only to the particular sub classifier. Because the resampled dataset has instances that belong to all the 4 sub classifiers

    6. create a dataset using those instances. Including features & labels

    7. split the created dataset, into sub-training(70%) and sub-test set(30%)

    8. oversample the sub-training set (because the classes are imbalanced)

    9. train the model with the oversampled sub-training set at step 8., & test it using the sub-test set

    More Sanushi Salgado's questions See All
    Similar questions and discussions