Why am I getting worse performance after GridSearchCV?

Shrey Jain @Shrey_Jain16

04 July 2021 0 10K Report

I first construct a base model (default parameters) and obtain MAE (rfr base file for image).

# BASELINE MODEL

rfr_pipe.fit(train_x, train_y)

base_rfr_pred = rfr_pipe.predict(test_x)

base_rfr_mae = mean_absolute_error(test_y, base_rfr_pred)

MAE = 2.188

Then I perform GridSearchCV to get best parameters and get the average MAE (rfr grid for image).

# RFR GRIDSEARCHCV

rfr_param = {'rfr_model__n_estimators' : [10, 100, 500, 1000],

'rfr_model__max_depth' : [None, 5, 10, 15, 20],

'rfr_model__min_samples_leaf' : [10, 100, 500, 1000],

'rfr_model__max_features' : ['auto', 'sqrt', 'log2']}

rfr_grid = GridSearchCV(estimator = rfr_pipe, param_grid = rfr_param, n_jobs = -1,

cv = 5, scoring = 'neg_mean_absolute_error')

rfr_grid.fit(train_x, train_y)

print('best parameters are:-', rfr_grid.best_params_)

print('best mae is:- ', -1 * rfr_grid.best_score_)

MAE = 2.697

Then I fit the "best parameters" obtained to get an optimized MAE but the results are always worse than the base model MAE (opt rfr for image).

# OPTIMIZED RFR MODEL

opt_rfr = RandomForestRegressor(random_state = 69, criterion = 'mae', max_depth = None,

max_features = 'auto', min_samples_leaf = 10, n_estimators = 100)

opt_rfr_pipe = Pipeline(steps = [('rfr_preproc', preproc), ('opt_rfr_model', opt_rfr)])

opt_rfr_pipe.fit(train_x, train_y)

opt_rfr_pred = opt_rfr_pipe.predict(test_x)

opt_rfr_mae = mean_absolute_error(test_y, opt_rfr_pred)

MAE = 2.496

Not just once but every time and in most of the models (linear regression, random forest regressor)! I guess there is something fundamentally wrong with my code else this problem wouldn't arise every time. Any idea what might be causing this?

Badges
Science topic

More Shrey Jain's questions See All

Why is pd.concat increasing my row count and also returning nan values?

I am trying to one hot encode my train and test dataset. For my train dataset, I have 2 dataframes with different number of columns but same number of rows.A (with encoded features) = (34164, 293)...

16 July 2021 1,657 0 View

Different results for mean absolute error when performing GridSearchCV vs when manually optimising the max_leaf_node parameter in Decision Tree model?

I am trying out hyperparameter tuning vs manually selecting best parameter (max_leaf_nodes) on a Decision Tree model with mean absolute error as scoring. In theory both should give me the same mae...

29 June 2021 3,209 3 View

Should I first split the data into train and validation sets and then use GridSearchCV on the training set followed by K Fold CV on my training set?

I am having a lot of confusion between GridSearchCV and K fold Cross Validation. I know that GridSearch is only for hyperparameter optimization and K Fold will split my data into K folds and...

26 June 2021 9,544 2 View

How do I calculate the correlation between two categorical variables in Python?

I am using Logistics Regression on a dataset where the dependent variable is a categorical one. I have multiple independent variables some of which are categorical. I want to know which of them...

03 June 2021 6,621 4 View

How do I overcome the "coffee ring" effect while performing drop casting?

I am trying to synthesize MoS2 thin film on a glass substrate using drop casting method but I am getting the "coffee ring" effect on the substrate. How do I get rid of the rings and instead get...

22 March 2021 7,267 15 View

How do I calculate absorption coefficient of my thin film if I dont know the thickness? Or how can I calculate the thickness of my sample?

I have done Uv-vis characterisation of my thin film and have received absorbance and transmittance data. Now I want to calculate the absorption coefficient but I dont know the thickness of my...

21 February 2021 7,463 8 View

What software would you say is the best for XRD analysis which has the option of profile fitting?

I am analyzing the data obtained from powder diffraction of a MoS2 thin film. I want to assign phases to the peaks obtained in my pattern. The software that I'm currently using is PowderX but it...

10 February 2021 7,455 10 View

Smart grid ideas?

How to use and this is economical?

06 August 2024 3,160 2 View

Training for new staff?

I am looking for some training for new staff that will be starting in a self contained classroom with students with ASD. Most new staff have little to no experience working with students with ASD....

03 August 2024 6,717 3 View

About impact factor?

I have been publishing the Nepalese Journal of Agricultural Sciences (online ISSN 2091-0428; print ISSN 2091-042X) on www.nepjas.com and research gate. I was wondering how I can get an impact...

01 August 2024 1,277 1 View

Can you visualize platelets using EVOS ?

Hi all, My lab has Thermo Scientific™ Invitrogen™ EVOS™ FL Auto 2 Imaging System, and I was wondering if I will be able to use it with whole blood, whilst focusing on platelets? The idea would...

24 July 2024 9,337 3 View

Will the leadership style used in the U.S. be successful in Australia, or will the Australians respond better to another?

Will the leadership style used in the U.S. be successful in Australia, or will the Australians respond better to another? Which leadership training methodology would be most successful with your...

14 July 2024 173 4 View

Is there any research paper on impact of knowledge sharing, training and development on employees retention??

I want to make thesis on this topic is it right??

06 July 2024 7,101 5 View

How to design an online training, learning platform ?

when designing an e-learning platform what model and programming language do you select?

29 June 2024 7,504 4 View

How to solve this error while I simulate gidl using tcad ?

There is an error message "interface silicon/oxide not found in grid xxx.tdr file"

26 June 2024 5,551 0 View

How to remove a preprint?

I have a preprint on IS COVID AS LETHAL AS THE SPANISH FLU. I have uploaded the printed, substantially revised paper and wish to delete the preprint. Many thanks for your help....

24 June 2024 2,604 0 View

Is a binary classifier based on Gaussian models resistant to the problem of training set imbalance?

A binary classifier based on multivariate Gaussian models, which estimates the mean vector and the variance-covariance matrix during the training phase and returns the class with the highest...

23 June 2024 10,114 1 View