I first construct a base model (default parameters) and obtain MAE (rfr base file for image).

# BASELINE MODEL

rfr_pipe.fit(train_x, train_y)

base_rfr_pred = rfr_pipe.predict(test_x)

base_rfr_mae = mean_absolute_error(test_y, base_rfr_pred)

MAE = 2.188

Then I perform GridSearchCV to get best parameters and get the average MAE (rfr grid for image).

# RFR GRIDSEARCHCV

rfr_param = {'rfr_model__n_estimators' : [10, 100, 500, 1000],

'rfr_model__max_depth' : [None, 5, 10, 15, 20],

'rfr_model__min_samples_leaf' : [10, 100, 500, 1000],

'rfr_model__max_features' : ['auto', 'sqrt', 'log2']}

rfr_grid = GridSearchCV(estimator = rfr_pipe, param_grid = rfr_param, n_jobs = -1,

cv = 5, scoring = 'neg_mean_absolute_error')

rfr_grid.fit(train_x, train_y)

print('best parameters are:-', rfr_grid.best_params_)

print('best mae is:- ', -1 * rfr_grid.best_score_)

MAE = 2.697

Then I fit the "best parameters" obtained to get an optimized MAE but the results are always worse than the base model MAE (opt rfr for image).

# OPTIMIZED RFR MODEL

opt_rfr = RandomForestRegressor(random_state = 69, criterion = 'mae', max_depth = None,

max_features = 'auto', min_samples_leaf = 10, n_estimators = 100)

opt_rfr_pipe = Pipeline(steps = [('rfr_preproc', preproc), ('opt_rfr_model', opt_rfr)])

opt_rfr_pipe.fit(train_x, train_y)

opt_rfr_pred = opt_rfr_pipe.predict(test_x)

opt_rfr_mae = mean_absolute_error(test_y, opt_rfr_pred)

MAE = 2.496

Not just once but every time and in most of the models (linear regression, random forest regressor)! I guess there is something fundamentally wrong with my code else this problem wouldn't arise every time. Any idea what might be causing this?

More Shrey Jain's questions See All
Similar questions and discussions