Thank you Padmanabhan sir. I would like to know the statistical validity of the estimate. For Kriging we do variogram analysis. How many samples are needed in the minimum case to get a variogram that captures the spatial relationship at least to some reasonable degree of accuracy below which we can say that the data is not sufficient to carry out kriging.
Very practical: For every step of your variogram grouping you will need at least 35 pairs, if less - the variogram is not enough accurate. This is not theoretical rule, but it is deduced from the practices. So, make the intervals enough large for including more points when producing the experimental variogram. This will give you also interesting information about your data. Get some panel in the centre of your field with dimensions you need, and try to make evaluation by kriging with different number of points around the panel. Increasing gradually the radius and checking the coefficient for every used data point in the kriging estimation, you will see that at a given distance the coefficients from the remote points will be negative. Thus, stop here and choose the radius giving you only positive coefficients. Success!
There are two different questions: (1) the number of data locations (as well as the spatial pattern) used to estimate/model the variogram (2) the number of data locations (as well as the spatial pattern) used in the kriging equations. There is no minimum number for the first question that is always sufficient although you will find various statements in the literature that claim there is. There are two different aspects of estimating/modeling the variogram, one is to select the variogram model (e.g spherical) and secondly to determine the parameter values in the model. It is common practice to compute and plot empirical variograms but for a given data set these are not unique. The number of data location pairs is completely determined by the number of data locations however this number does not determine the numbers of pairs for each distance class/angle window. Those numbers are affected by the width of the distance classes and the angle window tolerance, it will always be a compromise. The question you are asking is far more complicated than you recognize and does not have a simple answer.
The variogram model is a function satisfying certain conditions, e.g conditionally negative definite. The kriging equations will have a unique solution for any function satisfying these conditions but there is still the question of whether to use a moving search neighborhood or a unique neighborhood. If you are using a moving search neighborhood then 20-25 data locations in the neighborhood is sufficient and may be too many but note the phrase "in the neighborhood".
There is no such thing as an "accurate" variogram and you can not determine the "accuracy of kriging" The kriging variance does not depend on the data values, only on the variogram model and its parameters, numbers of data locations in the search neighborhoods and the spatial patterns of those data locations.
You should begin with asking what you know about the phenomenon that generates the data, e.g is the spatial correlation likely to have a directional dependence, is there a plausible guess as to the range of the spatial correlation. Is the data cheap or expensive, is it easy or difficult to get. Is it "point" data or does it have non-point support? Once you have some data you will want to use various exploratory statistics to gain some insight into the data. You may want to experiment with different variogram models, different search neighborhood parameters and look at the cross validation statistics
I understand that sufficient spatial spread is as important as the number of samples. I am speaking from the angle of OIL & gas fields that are yet to be developed where we have just 3 or 4 wells which act as hard data sample points. Can we run a meaningful variogram analysis using just 3 or 4 sample points and use some soft data like seismic data to co-krig since that can be acquired over a denser grid if there is a good correlation between the seismic property and the reservoir property of interest encountered in the wells.
1. You need to look at the oil & Gas geostatistics literature because the problems there are quite different from many other areas of application, in particular the example you mention. APG has a two volume review of stochastic methods for oil and gas.thast is worth looking at
2, You mention 3 or 4 wells but the important question is what kind of data do you have from the wells, i.e. these would correspond to only 3 or 4 locations horizontally but there may be data at many different vertical levels depending on the variables of interest.
3. You could certainly compute a sample variogram with only 3 or 4 data locations but it is likely to be nearly useless for trying to fit a model variogram.
4. You should ask about available "expert knowledge", i.e. a geologist/engineer who can compare the site with similar sites that have already been exploited. As a starting point perhaps use a variogram model from a similar site, use the assumed model to generate kriged values which might be used for history matching
To Stefan and James
35 data locations is neither a minimum nor a maximum, it all depends on the particular problem. If it is easy to collect data at a lot of locations and is inexpensive to do so then generally speaking the more data locations the better but the spatial pattern of the data locations is also important. A regular grid is almost always inefficient, i.e for a given number of data locations you do not get a good spatial pattern. Ideally you want lots of pairs of locations for each of many different separation distances especially shorter distances. Since you won't know or have much of an idea as to the possible range of the variogram (prior to collecting data) it becomes a problem of trying to extract as much information as possible from the data you do have but recognizing that you may find you do not have the right data. For example if you choose a regular grid with a 10 m spacing and it turns out that the range is actually less than 10 m you will never discover that from your data since you will not have any pairs with separation distances less than 10 m.
While you can't be sure that the spatial correlation for your problem is the same as one you find in the literature even if it is the same variable, looking at the literature to see what other researchers have found for similar problems is a good guide and starting point.
To sum it up, there is no minimum number of data locations that "ensure" optimum output when using kriging. "Optimum output" is a meaningless term when applied to kriging.
1990, A.W. Warrick, R. Zhang, M.M. Moody and D.E. Myers, Kriging Versus Alternative Interpolators: Errors and Sensitivity to Model Inputs - in Field-Scale Water and Solute Flux in Soils (Monte Verita),(eds)Roth, Fl�hler, and Jury � Birkhauser Verlag Basel
1991, Myers,D.E.Interpolation and Estimation with Spatially Located Data Chemometrics and Intelligent Laboratory Systems 11, 209-228
1987, A. Warrick and D.E. Myers, Optimization of Sampling Locations for Variogram Calculations Water Resources Research 23, 496-500
The problem is not as simple as you seem to suggest. First you need to distinguish between an adequate number of data locations for estimating/fitting the variogram from an adequate number of data locations to use in the kriging equations (for the latter you also have to consider the distinction between Ordinary kriging and Universal kriging, Secondly you have to consider the geographic extent of the study region as well as the spatial pattern of the data locations.
It is true that since IDW has no theoretical foundation the number of data locations may not have much impact on the value of the interpolation results. Although you might apply IDW with a small number of data locations you will also have no measure of how good the results are.
The minimum number of sample locations required depends on the total number of data samples/locations and their respective distribution spatial pattern being used to fit the variogram model.
Although, some researchers often set a range of minimum sample points/locations between 25 -40, some use 28 as their minimum sample points,but specifically there is no minimum number required. The more the number of points within the specified study area, the better the krigging interpolation outputs.
Please read these articles for further explanations.
This is a nonsensical question "optimal input through kriging". In addition the data set is one non-random sample from a partial realization of a random function, the question is not about "samples", instead it is the number and spatial pattern of the data locations. There is no minimum number of data locations that would be sufficient in every application. Journel (Mining Geostatistics) suggested a minimum of 50 but that was in connection with estimating and modeling the variogram. In the kriging equations that many data locations may lead to negative weights. Normally one uses a moving search neighborhood so the total number of data locations might be much greater. R Webster et al suggested 35 but again this pertains to estimating and modeling the variogram not to the kriging equations.
One of the best references for geostatistics is the book by J.-P. Chiles and P. Delfiner "Geostatistics:, J. Wiley & sons.
It is true you could apply IDW with only a few data locations but the results are likely to be useless and you will not have any information as to the reliability of the results