01 January 1970 0 740 Report

Hello, I have been trying to apply SMOTE to right censored data

How to properly apply SMOTE over-sampling technique on right censored data? I have dataset (210x16) only with around 7% observations where event (death) occured. Hence I wanted to make my data more balanced. I tried performing SMOTE from imbalanced-learn after train/test split (only on training data to then explore model on realistic test sample) like this:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=random_state)

X_smote, y_smote = X_train.copy(), y_train['status'].copy()

X_smote['time'] = y_train['time'].astype(np.int64)

sm = SMOTENC(random_state=random_state, categorical_features=categorical_features)

X_train, y_train = sm.fit_resample(X_smote, y_smote)

My y is a structured np.array of pairs ('status, 'time'). I'm confused what to do with time-to-event variable from y, I decided to put in X and then put it back to y after over-sampling. Is it correct approach?

I found some research on Balanced Random Survival Forest (BRSF): https://www.researchgate.net/publication/324055697_Balanced_Random_Survival_Forests_for_Extremely_Unbalanced_Right_Censored_Data

If the above codes are not correct and the approach not right, I wonder what would be the right way ? And what would be the right codes?

I don't mind using python or R.

I just want to apply "balanced random survival forest " and would appreciate if someone can help

More Hatem Ali's questions See All
Similar questions and discussions