Hello, I have been trying to apply SMOTE to right censored data
How to properly apply SMOTE over-sampling technique on right censored data? I have dataset (210x16) only with around 7% observations where event (death) occured. Hence I wanted to make my data more balanced. I tried performing SMOTE from imbalanced-learn after train/test split (only on training data to then explore model on realistic test sample) like this:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=random_state)
X_smote, y_smote = X_train.copy(), y_train['status'].copy()
X_smote['time'] = y_train['time'].astype(np.int64)
sm = SMOTENC(random_state=random_state, categorical_features=categorical_features)
X_train, y_train = sm.fit_resample(X_smote, y_smote)
My y is a structured np.array of pairs ('status, 'time'). I'm confused what to do with time-to-event variable from y, I decided to put in X and then put it back to y after over-sampling. Is it correct approach?
I found some research on Balanced Random Survival Forest (BRSF): https://www.researchgate.net/publication/324055697_Balanced_Random_Survival_Forests_for_Extremely_Unbalanced_Right_Censored_Data
If the above codes are not correct and the approach not right, I wonder what would be the right way ? And what would be the right codes?
I don't mind using python or R.
I just want to apply "balanced random survival forest " and would appreciate if someone can help