I want to evaluate the quality of a prediction, by comparing predicted to observed values of a continious variable X. Actually, I'm using MAD statistic. I want to know if there are additionnal statistics which offer this feature?
People usually distinguish two types of prediction error: bias and variance. Sometimes these are combined. See for example, chapter 7 of http://statweb.stanford.edu/~tibs/ElemStatLearn/. As far as MAD, I assume that is mean or median absolute deviation. More often people use the square of these errors, but the absolute deviation will lessen the impact of outliers and therefore is more robust in many circumstances.
MAD-stats certainly describes well the behavior of uniformly distributed values. However, often the behavior of a continious variable X is affected by different factors. In this case it is better to use simulation with an appropriate statistical criterion type Fisher or Teil. This situation frequently occurs during the environmental risk assessment (please find "Risk-based Environmental Management (english version)", DOI: 10.13140/RG.2.1.4563.2089)
Might I suggest Validating your prediction equation by using a cross-validation scheme. This approach is a generalization of what you are doing now. I assume you are using some type of regression scheme so I will refer you to Harrell Jr, F., Regression Modeling Strategies, Springer. Harrell has written cross-validation packages that are freely available. His book explains the choices of cross-validation schemes by regression type and refers you to the packages. He has some newer ones that a Google search will find for you. All of these are written in R, a no cost statistical language. Best wishes.
I would prefer MAD (or MAE) to MSE because of the reasons described in Section 3.1 in (Davydenko and Fildes, 2016). The reasons are mainly the following:
1) MAE will usually correspond to the loss function that was used to optimise forecasts even if transformations were applied to data (and the forecasts were back-transformed). Transformations such as logs/Box-Cox are often used with real data to stabilise variance.
2) MAE is a more efficient estimator of the population mean of absolute error compared to the efficiency of the MSE as the estimator of the population mean of squared error
3) relative relative MAEs is less biased compared to relative MSE.
One problem with MAD (MAE) is that it is not robust. In our paper (Davydenko and Fildes, 2016) we show that trimmed MAD will give a biased result due to the non-symmetrical distribution of absolute error. We propose the use of asymmetric trimming algorithms to calculate MAEs (see Section 4 in Davydenko and Fildes, 2016).
As for alternative measures (e.g., MAPE, percent better, MdAPE, MASE, GMRAE, etc.) we demonstrate that they simply do not reflect accuracy in terms of a symmetric linear loss and can be rather difficult to interpret (see Section 3 in Davydenko and Fildes, 2016).
Chapter Measuring Forecasting Accuracy: Problems and Recommendations...
Chapter Forecast Error Measures: Critical Review and Practical Recommendations
Vous pouvez utiliser aussi le RMSE. Son inconvénient pratique est d'exagérer l'effet des grandes erreurs même si celles-ci sont très rares (mais ce peut être une qualité pour certaines applications, où une grande erreur est rédhibitoire !). En revanche, son avantage est d'être de même nature que (donc comparable à) l'écart-type de l'erreur de prévision (et la fourchette correspondante) qui est fournie par le modèle de prévision (régression, lissage adaptatif, etc.).
Probabilistic predictive performance can be assessed by at least three basic criteria: mean, dispersion, and reliability of the prediction ensemble. With respect to mean, the prediction ensemble is said to be accurate when the ensemble is centered on the cross-validation data. With respect to dispersion, the prediction ensemble is said to be precise when the prediction interval has a narrow band. With respect to reliability, the prediction ensemble is said to be more reliable when the prediction interval bands bracket more cross-validation data. Figure 1 in Elshall et al. (2018), illustrates the differences between the three criteria.
To assess model predictive performance, several single-criterion metrics or scoring rules are commonly used. Single- criterion metrics focus on a single prediction criterion. For example, with respect to mean, squared residual error metrics such as root mean squared error (RMSE) and Nash-Sutcliffe model efficiency (NSME) are the most commonly used metrics. With respect to dispersion, the commonly used metrics to measure the precision of the prediction interval include ensemble standard deviation and sharpness. With respect to reliability, several simple to complex reliability metrics have been used. However, it is not uncommon to visually assess reliability since there is usually a trade-off between reliability and dispersion. Yet in the absence of any preference toward any single predictive performance criterion, the single-criterion metrics are generally insufficient for judging the overall predictive performance, since there is always puzzling trade-off between different criteria that focus on different aspects of prediction. Thus, in addition to using the single-criterion metrics, scoring rules are needed to provide a summary measure of the overall probabilistic predictive performance.
Gneiting and Raftery (2007) provided an excellent theoretical and critical review of different scoring rules, and they defined scoring rules as follows: "scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes." Among the scoring rules for evaluating the overall predictive performance are log-score, continuous ranked probability score (CRPS) and relative model score (RMS, Elshall et al. 2018).
References:
Elshall, A.S., Ye, M., Pei, Y. et al. Stoch Environ Res Risk Assess (2018). https://doi.org/10.1007/s00477-018-1592-3
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction and estimation. J Am Stat Assoc 102(447):359–378