I have a time series of daily precipitation data. There are a lot of missing data. In some cases, data are missing for many consecutive days. What is the best method for filling precipitation data?
Maybe regression will help. In regression you come to determine two values for the dependent variable- the true values (which you already have, to be used for building your regression model) and the predicted, estimated or regressed values (if you want to go back in history).
depends what is your aim afterwards. If you just do explorative analysis or are interested in precise predictions, then a single imputation method will do your job.
There is a lot of methods available, maybe the amelia or the VIM package is of interest for you. Amelia can handle time series, VIM has some commonly known imputation methods included like k nearest neighbor imputation or regression imputation.
With both packages you can also multipe impute your missing values. This is preferable if you do some variance estimation as well.
You kindly refer to National climate center (NCC),Office of Additional Director General of Meteorology(Research),Pune,IMD,MOES,India. NCC did it for whole India.
One method would be to dowload satellite date (the best would be TRMM due to its rather fine spatial and temporal (3h) resolution). Then you extract TRMM values at the site of your chronological time-series. Then you compare the values of your time series with TRMM. If you judge the correlation is correct, then use TRMM (average over 1 day) to fill your series. If you judge that there is a bias or some correction to adjust satellite values, apply this bias or correction to the filled values.
(A already performed such exercise to bouy times series with various model and satellite data. This can lead to pretty good results
From your question it is not clear if you have access to other neighborhood precipitation data as well as the length of your records. I have researched some time ago exactly this problem, making a systematic comparison between available algorithms. My main results for the data at hand (30 years of daily precipitation values, midlatitude without mountains and so on) were that the results using any sort of multivariate approach is dramatically better than considering just one weather station time series. Check my dissertation at http://www.thedigitalmap.com/~carlos/papers/PhDthesis/thesis.htm
If you check the atmospheric physics literature and the geostatistics literature you will find a lot of papers devoted to this question and in particular the use of kriging and cokriging. Depending on the local atmospheric conditions and size of the region of interest as well as the variability in elevation it may be necessary to incorporate elevation as an auxiliary variable but this can be done in at least two different ways and there is no universally applicable justification for which of the two is better.
I have problem like you. Can I fill the missing data of daily rainfall data with linear interpolation ? I used data for 4 months (108 row) consist of 25% missing value. The Neighbor station has no data too.
Triana You need something more sophisticated than linear interpolation, as I suggested above you need to look at the literature including the geostatistics literature. There is also the question of what kind of data you have, i.. precip for many days and many gauge locations but not all locations for some days and/or precip for many gauge locations but not all days for all locations. This makes it a space time problem or it limits the information you have for any one interpolation. Also you may have to take elevation into account. Regression is not adequate for any of these variations and certainly not linear interpolation. Check the literature