I am trying to do a comparison between many interpolation techniques like Kriging, IDW and splines for rainfall and temperature data and I want to know, if using RMSE for comparison enough or should I use other tests like MAE or MSE or R² ?
The aim of the analysis (or of natural science in general) is to identify patterns that allow us to recognize relationsships between variables and to make predictions. Using "tests" is not very helpful in this whole scientific process.
What is helpful is to describe the observed relationship between rainfall and temperature (plots!), possibly also under consideration of other (presumably relevant) variables, and try to identify possibly relevant patterns and to model them with some appropriate model. If you want to use the model for predictions, then it is best to fit a model only on a subset of the data and use the rest of the data to investigate how well the model predicts the response variable. Depending on the model it might be that this can not be sensibly expressed by any single number (e.g. the model might predict rainfall at higher temperatures with a higher precision, but with a larger bias, then for lower temperatures). It may then be interesting to state how large the expected bias is, and what a credible range for the expected response is, and also what a credible range of values is - all this possibly for different values of the predictor(s).
Some frequently used visual comparison techniques is here:
Vai Vaze, Phillip Jordan, Richard Beecham, Andrew Frost, Gregory Summerell. 2011. Guidelines for Rainfall-Runoff Modelling: Towards Best Practice Model Application
I am trying to do a comparison between many interpolation techniques like Kriging, IDW and splines for rainfall and temperature data and I want to know, if using RMSE for comparison enough or should I use other tests like MAE or MSE or R² ?
In RMSE the effects of squaring the error gives more weight to larger errors than smaller ones, skewing the error estimate towards the odd outlier. In many circumstances it makes sense to give more weight to points further away from the mean--that is, being off by 10 is more than twice as bad as being off by 5. In such cases RMSE is a more appropriate measure of error.
If being off by ten is just twice as bad as being off by 5, then MAE is more appropriate.
I believe in your case calculating both RMSE and MAE would be more helpful to make the final conclusion in terms of the best interpolation method.
1. The spline is a special case of radial basis function interpolation and kriging is equivalent to radial basis function interpolation (although the results will look different since in the case of kriging the interpolating function is implied but not explicitly given whereas in the case of radial basis functions the function is explicitly given)2
2. The different "test statistics" you are considering are based on various statistical assumptions, e.g random sampling and any conclusions you might draw using them are only valid if the underlying assumptions are satisfied. There are no statistical assumptions for IDW, it is not derived from any assumptions and hence there is no theory for IDW, There are no statistical assumptions for the radial basis function interpolator. In the case of kriging the statistical assumptions are quite different, the data is not a random sample, instead it is non-random sample from one realization of a random function. Random selection of the data locations does not correspond to random sampling..
If you look at the literature you will find a number of papers that have done empirical studies of different interpolators for specific data sets (you won't be able to draw any conclusions about how the different methods compare in general. Note that the size of the data set is important, the spatial pattern of the data locations (a regular grid is not necessarily better) and various user made decisions all can have an effect on the comparison.
A more critical question is what you want to do with the interpolation results, i.e what kinds of questions do you want to answer Is getting the data easy/cheap or is it expensive/difficult. Generally speaking more data locations is better but there is also the question of measurement/analytical error both for the data values and also for the data locations.
Thank you Dr.Myers. The goal behind the comparison of different interpolation techniques in my thesis is to choose the one that can assess the best the temporal and spatial variability of precipitation and temperature across the region. Obviously, I should run some tests like RMSE or MAE or R² ..etc. Do you think RMSE is enough or other tests like MAE are advisable, like Pr.Seyyedi underlined above?
No, it is not. The RMSE is calculated from the residuals of a model fitted to all the data. Therefore it is the most optimistic estimate of an error you can get. This is like asking the cook if the meal is tasty. CV means to fit the model on a proportion p of the data but analyse the properties of the fit on the remaining proportion (1-p) of the data. This gives you a less biased few on the model performance.
The same problem applies to all the data you have. How can you know that this data tells you a correct story? - You can't. You take what you have and you calculate what you *should expect*, assuming that the data is representative enough and assuming that the model you have chose to derive your expectations is sensible, and assuming some more things. In short: there is no inference possible without assumptions. And this is not meant in a way that assumptions are somethinf bad and disturbing. It is the mere foundation that makes inference possible at all, like one cannot swim without having water.
The selection of the data used for fitting and for validation should not depend on the model or on anything else related to your analysis. Any dependency would introduce a bias. The best way to achieve this is to usually called a "random selection", so use a random number generator to select the subsets of the data.
Yes, both parameters are good for analyzing. I used both parameters and bootstrapping test in my study " Investigation and Validation of MODIS SST in the Northern Persian Gulf". In addition, minimum, maximum, mean, standard deviation, and R2 are other parameters used in that study. Also, the relationship between SST RMS and total column precipitable water vapor was discussed in my work.
Before you use any of these statistics you need to think about the underlying statistical assumptions that determine the validity of the statistical tools. In general the ones that have been mentioned are based on non spatial methods and in particular assume that the data is obtained by random sampling. However kriging is based on very different assumptions, in particular the data is not s random sample nor is normality an assumption (multivariate normality would be relevant but most transforms will only result in univariate normality.
Cross validation is one useful tool and it can be used to generate a number of relevant statistics, what software are you using, e.g. commercial such as ARGIS or open source such as the R package "gstat"? If by chance you are writing your own software or just some you found on the web, you need to worry about whether it is correct and fully validated.
The suggestion above about normalizing with the "true mean" indicates a lack of understanding about the derivation of the kriging equations and the underlying theory. See the book "Geostatistics" by Chiles and Delfiner (J. Wiley). There are two editions and you want to see the second one.
I will be comparing 3 methods: IDW, Ordinary Kriging and Spline to interpolate temperature and precipitation data. I never used R package before so I am going to work with ARCGIS and excel. In that case, which methods do you think are better to use? RMSE and MAE or RMSE and cross validation?
As you know Dr Mercea, those methods differ from region to region. They depend on many factors. In the region I am studying, the comparison was never done before so I think my research will fill the gap
Corresponding to each method, test the significance of the difference between the observed values and the corresponding estimated values yielded by the method by a suitable test statistic (t statistic in this case).
Comparing the calculated values of t for the different methods , the said methods can be compared.
The method that corresponds to lesser value of t is better method.