I first construct a base model (default parameters) and obtain MAE (rfr base file for image).

# BASELINE MODEL

rfr_pipe.fit(train_x, train_y)

base_rfr_pred = rfr_pipe.predict(test_x)

base_rfr_mae = mean_absolute_error(test_y, base_rfr_pred)

MAE = 2.188

Then I perform GridSearchCV to get best parameters and get the average MAE (rfr grid for image).

# RFR GRIDSEARCHCV

rfr_param = {'rfr_model__n_estimators' : [10, 100, 500, 1000],

'rfr_model__max_depth' : [None, 5, 10, 15, 20],

'rfr_model__min_samples_leaf' : [10, 100, 500, 1000],

'rfr_model__max_features' : ['auto', 'sqrt', 'log2']}

rfr_grid = GridSearchCV(estimator = rfr_pipe, param_grid = rfr_param, n_jobs = -1,

cv = 5, scoring = 'neg_mean_absolute_error')

rfr_grid.fit(train_x, train_y)

print('best parameters are:-', rfr_grid.best_params_)

print('best mae is:- ', -1 * rfr_grid.best_score_)

MAE = 2.697

Then I fit the "best parameters" obtained to get an optimized MAE but the results are always worse than the base model MAE (opt rfr for image).

# OPTIMIZED RFR MODEL

opt_rfr = RandomForestRegressor(random_state = 69, criterion = 'mae', max_depth = None,

max_features = 'auto', min_samples_leaf = 10, n_estimators = 100)

opt_rfr_pipe = Pipeline(steps = [('rfr_preproc', preproc), ('opt_rfr_model', opt_rfr)])

opt_rfr_pipe.fit(train_x, train_y)

opt_rfr_pred = opt_rfr_pipe.predict(test_x)

opt_rfr_mae = mean_absolute_error(test_y, opt_rfr_pred)

MAE = 2.496

Not just once but every time and in most of the models (linear regression, random forest regressor)! I guess there is something fundamentally wrong with my code else this problem wouldn't arise every time. Any idea what might be causing this?

Similar questions and discussions