I have had the pleasure and misfortune of interpolating and finding rainfall data before; here are some tips. If the rainfall is zero because there was no precipitation, leave the value of zero. If there was precipitation, but rainfall measurement data were not collected or are missing, you may choose to interpolate the value. Rainfall is a notoriously stochastic variable, such that it is much more difficult to predict or interpolate than temperature. That being said, it also depends on the timescale of your time series. If you have many months, say >60 then it is easier to interpolate using a moving average or a cubic spline function/polynomial smoother. If your main concern is correlation of rainfall and not time series analyses or for casting, you may choose to leave the monthly value missing and it will be excluded from the analysis altogether. Hope that helps! If you have a lot of missing data and your geographic area is relatively homogenous, you can also download precipitation data from the TRMM housed by NASA. Search GIOVANNI-TOVAS and you will find the website to get the data.
I say the same. Instead of deleting, you can replace the missing data. I think that a best method has not been established for missing data interpolation. You can use the linear temporal tnterpolation by using values based on the average of the corresponding data immediately preceding and immediately following the data in time. Another way, calculate the median: replaces missing values with the long term median value for that period in the cycle. Caution! ... if you have evidence of trends.
My answer is almost same as upper answers. In the point of data, you need to store the raw data as default value( for example, you say, zero ) or null value.
I don't think that the rain fall data is adequate in interpolation. then, in application, it's better to show it's not available in January. isn't it ?
both of default value and null value can be used for it, but null value can give one more information that it is absent.
This seems like a clear thing: if there is no rain, then the data point should be 0. Then again, if you use some analysis pipeline where 0 would cause trouble in the subsequent analysis (e.g. assuming lognormally distributed errors), I would suggest to add a small value epsilon, such as 0.001. I would strongly suggest to not treat it as a missing value and use imputation.
Exactly. I do not have clear this point because of Arnab talks about "not available observation" ("NA") and "there is no rainfall" ("0") in the same time. I want to make it very clear that my answer is when we have MISSING DATA (NA). I also agree with Ivan. Zero is zero. However in some cases (e.g. Ordinary Kriging) is neccesary to transform the zero values "0" (e.g. 0.001).
I agree with Ivan and Oliver. The question seems to me to indicate a great confusion in Arnab's position. No-rain fall does not mean missing datum, but a 0 in that datum, full stop. If the datum that month was not collected one does not know if there was no rain, and it can means any value,z ero incldued: however, it is a missing datum with no chance to evaluate its value. "Interpolating" should be different from 'inventing' a datum, and should strictly be avoided, otherwise you bias your time series.