What is the best metric to evaluate the quality of a prediction?

Might I suggest Validating your prediction equation by using a cross-validation scheme. This approach is a generalization of what you are doing now. I assume you are using some type of regression scheme so I will refer you to Harrell Jr, F., Regression Modeling Strategies, Springer. Harrell has written cross-validation packages that are freely available. His book explains the choices of cross-validation schemes by regression type and refers you to the packages. He has some newer ones that a Google search will find for you. All of these are written in R, a no cost statistical language. Best wishes.

Mohamed Ben Mzoughia

Dear Daniel Wright,

Thank you very much for your suggestion. I hope this is going to be very useful for me in my work.

Mohamed Ben Mzoughia

Dear Aleksandr Andreevich Tskhai,

Thank you very much for your suggestion. I hope this is going to be very useful for me in my work.

Mohamed Ben Mzoughia

Dear David Eugene Booth

Thank you very much for your suggestion. I hope this is going to be very useful for me in my work.

Raoof Mostafazadeh

@Mohamed,

A combination of VISUAL COMPARISON and STATISTICAL CRITERIA would be useful to evaluate the quality of model predictions.

Have a look at link below which discussed well from different points of view (In field of hydrologic modelling).

https://www.researchgate.net/post/How_to_compare_Hydrological_Model_Results/2

Raoof Mostafazadeh

Here is different types of visual assessment methods.

Krishnan Umachandran

Absolute grasp quality prediction.

https://www.google.co.in/url?sa=t&source=web&rct=j&url=http://web.engr.oregonstate.edu/~balasubr/pub/Balasubramanian-Grasping-IROS-2014.pdf&ved=0ahUKEwi95seOqovKAhUOGY4KHTxNCsg4PBAWCBgwAA&usg=AFQjCNFN6U6oszcV-UYUjKYErQxs_9Ojcg&sig2=IiCD2j9tkYCbThn1hIjCwQ

Andrey Davydenko

It depends on how many series you have and what is your loss function that you want to optimise.

Please have a look at our latest papers on how to measure forecasting accuracy:

https://www.researchgate.net/publication/284947381_Forecast_Error_Measures_Critical_Review_and_Practical_Recommendations

https://www.researchgate.net/publication/282136084_Measuring_Forecasting_Accuracy_Problems_and_Recommendations_by_the_Example_of_SKU-Level_Judgmental_Adjustments

I would prefer MAD (or MAE) to MSE because of the reasons described in Section 3.1 in (Davydenko and Fildes, 2016). The reasons are mainly the following:

1) MAE will usually correspond to the loss function that was used to optimise forecasts even if transformations were applied to data (and the forecasts were back-transformed). Transformations such as logs/Box-Cox are often used with real data to stabilise variance.

2) MAE is a more efficient estimator of the population mean of absolute error compared to the efficiency of the MSE as the estimator of the population mean of squared error

3) relative relative MAEs is less biased compared to relative MSE.

One problem with MAD (MAE) is that it is not robust. In our paper (Davydenko and Fildes, 2016) we show that trimmed MAD will give a biased result due to the non-symmetrical distribution of absolute error. We propose the use of asymmetric trimming algorithms to calculate MAEs (see Section 4 in Davydenko and Fildes, 2016).

As for alternative measures (e.g., MAPE, percent better, MdAPE, MASE, GMRAE, etc.) we demonstrate that they simply do not reflect accuracy in terms of a symmetric linear loss and can be rather difficult to interpret (see Section 3 in Davydenko and Fildes, 2016).

Chapter Measuring Forecasting Accuracy: Problems and Recommendations...

Chapter Forecast Error Measures: Critical Review and Practical Recommendations

Andrey Davydenko

P.S. This poster summarizes features of various measures, but see the above papers to have updated findings.

Mohamed Ben Mzoughia

Dear Raoof Mostafazadeh,

Dear Krishnan Umachandran,

Dear Andrey Davydenko,

Thank you very much for your suggestion.

Michel Vaté

Vous pouvez utiliser aussi le RMSE. Son inconvénient pratique est d'exagérer l'effet des grandes erreurs même si celles-ci sont très rares (mais ce peut être une qualité pour certaines applications, où une grande erreur est rédhibitoire !). En revanche, son avantage est d'être de même nature que (donc comparable à) l'écart-type de l'erreur de prévision (et la fourchette correspondante) qui est fournie par le modèle de prévision (régression, lissage adaptatif, etc.).

Andrey Davydenko

About RMSE:

It has the same disadvantages as MSE (I wrote about them earlier), which makes MAE (MAD) more suitable compared to RMSE/MSE.

Mohamed Ben Mzoughia

Dear Michel Vaté, Thanks for your useful suggestion.. All the best.

Saeed Samadianfard

I agree with Andrey DavydenkoI and think that mean absolute error (MAE) is more appropriate for assessing the quality of predictions.

Ahmed S. Elshall

Probabilistic predictive performance can be assessed by at least three basic criteria: mean, dispersion, and reliability of the prediction ensemble. With respect to mean, the prediction ensemble is said to be accurate when the ensemble is centered on the cross-validation data. With respect to dispersion, the prediction ensemble is said to be precise when the prediction interval has a narrow band. With respect to reliability, the prediction ensemble is said to be more reliable when the prediction interval bands bracket more cross-validation data. Figure 1 in Elshall et al. (2018), illustrates the differences between the three criteria.

To assess model predictive performance, several single-criterion metrics or scoring rules are commonly used. Single- criterion metrics focus on a single prediction criterion. For example, with respect to mean, squared residual error metrics such as root mean squared error (RMSE) and Nash-Sutcliffe model efficiency (NSME) are the most commonly used metrics. With respect to dispersion, the commonly used metrics to measure the precision of the prediction interval include ensemble standard deviation and sharpness. With respect to reliability, several simple to complex reliability metrics have been used. However, it is not uncommon to visually assess reliability since there is usually a trade-off between reliability and dispersion. Yet in the absence of any preference toward any single predictive performance criterion, the single-criterion metrics are generally insufficient for judging the overall predictive performance, since there is always puzzling trade-off between different criteria that focus on different aspects of prediction. Thus, in addition to using the single-criterion metrics, scoring rules are needed to provide a summary measure of the overall probabilistic predictive performance.

Gneiting and Raftery (2007) provided an excellent theoretical and critical review of different scoring rules, and they defined scoring rules as follows: "scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes." Among the scoring rules for evaluating the overall predictive performance are log-score, continuous ranked probability score (CRPS) and relative model score (RMS, Elshall et al. 2018).

References:

Elshall, A.S., Ye, M., Pei, Y. et al. Stoch Environ Res Risk Assess (2018). https://doi.org/10.1007/s00477-018-1592-3
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction and estimation. J Am Stat Assoc 102(447):359–378

In the banking sector, how to evaluate the profitability of a customer' segment compared to the overall aggregates of the bank?

Can anyone suggest typical survey to evaluate the maturity of companies in terms of dashboards and performance management?

How to learn more about SPSS and its Application?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Posthoc test lettering in JAMOVI?

How to back transform the results generated from analyses using log transformed with In(X+1) data?

What is the relationship between protein structure and N or C terminal tagging choosing?

Have you tried using Vizly for your data analysis? Use the link: https://vizly.fyi/?via=olatomide. How do you see it?

Is it appropriate for researcher(s) to collapse five or four rating Likert scales to three or two as the case maybe during data analysis?

How to test multivariate outlier in STATA?

Who wants opportunities for scientific cooperation?

Suggestion for PhD Research Topic/Topics in Applied Statistics?