I have a question, is there any guidance how we choose lag distance? Let's say, I have 2 composite data, 5 and 10 respectively. I proceed these data on Surpac to do variogram analysis.
Variogram models are called by most of adjusting final predictions using geostatistical simulations. The current research of empirical maps allows 4 basic models model of regionalization. To review the results of model fitting, You can best look at following pdf file, and particuraly find:
a) how to fit variograms,
b) how to modeling the variogram, especially processes that exhibits a nugget effect in the variogram model,
c) what is the one of best reasonable solution of variogram modelling,
d) in what way to estimate the variogram model, and visually examine each variogram using suitable empirical maps?
In this example, which will be later used to make predictions and choose lag distance, You should get default values for initial variogram (many functions in any software are automated set-up, and geostatistical analysis may be proceed with no guide.
Your lag distance is essentially controlled by two features, the first is the distance between samples. Obviously it would be of no value whatsoever to use a lag distance that is a lot smaller than the distance between your samples as the majority of lags will have no data. A good rule is to use a lag spacing of half the average nearest neighbour distance. The second feature is the variability of the dataset and whether or not there is any level of periodicity within the data, If there is utilise at maximum half the periodic distance, if one uses a distance greater than this you stand the chance of missing the periodic function completely.
While I essentially agree with the preceding comments, for me, the choice of lag distance should not be based exclusively on the basis of a rule of distance between measurement points, but also on knowledge of how the different factors being modelled vary in space and time at different scales, as well as considering the reliability / measurement error of input data layer(s).
What spatiotemporal variations of the input parameter are likely to enter into your modelling? Review the literature to determine what is the range of possible variation at different scales, and in particular in contexts similar to the one you are studying. How well do you think your data really represents the trends that are likely to occur for your study area? Is the density of your input data points high enough to capture the trends you are interested in measuring? Are your continuous variables input from irregularly distributed points data or raster data? Are the values measured data or estimates (interpolated data such as DEM altitudes)? Do you know the measurement error of input point data?
One consideration is that lag distance variation should be greater than estimation error (see : nugget distance). If you are using high-density point layers with unknown measurement error or relatively high measurement error, I agree with Christian's proposal, starting lag distance should cover a neighbourhood of points numbering roughly 25 to 30 points on average. This will avoid modelling local variation essentially due to measurement uncertainty, model overfitting, etc.
However, if your measures are relatively reliable (you have checked measurement error or estimate precision) but measurement points are low density and/or quite irregularly distributed in space (for example climate measures), then you may want to set minimum lag distance to cover only small numbers of measurement sites. In this case it is advisable to check (raster calculation) the number of input points used to calculate output values across your study area, and perhaps eliminate from the study certain areas that have too few input points.
The document at http://faculty.washington.edu/edford/Variogram.pdf gives a pretty good view of nugget and lag effects depending on data accuracy and point density
2. A second issue is that of directional patterns in geographic variation. Is your autocorrelation anisotropic? You can visualise he data and autocorrelation layers to verify that at key lag distances there is or is not a spatial orientation in your study area. A same phenomena modelled form the same input sources can show very different anisotropic structure depending on study area (modelling temperatures in plains or in mountains for example) or between seasons for example.
Your question or rather your problem is not clear. You say you have "two composite data 5 and 10". Do you mean you have two data sets or two data values and exactly how were the data "composited". First recognize that geostatistics is based on the assumption that the data is a non-random sample from a partial realization of a random function (some authors will use the term "random field"). The random function is assumed to satisfy certain stationarity conditions. Assuming these are satisfied the variogram is a characteristic of the random function (and not of the data). To actually compute the variogram would require knowing the multivariate probability distribution of the random function but in practice only a data set is known. The variogram must satisfy certain mathematical conditions, e.g. conditionally negative definite. Hence in practice one must use the data to estimate/model the variogram, thus the use of a sample (experimental) variogram. For a given data set the sample variogram is not unique, it not only depends on the actual set (including the pattern of the data locations). The data is not multiple samples, it is only one sample. The sample variogram is obtained by computing the "half squared differences", i.e. for each pair of data locations compute half the squared difference of the data values. The total number of half squared differences is completely fixed by the number of data locations, the choice of the lag distances determines how the set of half squared differences is split up into subsets. Ideally you want as many subsets as possible but also as many differences per subset as possible but these are conflicting objectives. As noted by another responder if you choose the lag distance to be too small then you will have some subsets with no half squared differences. If you choose the lag distance too large you will not have enough plotted points to imply the shape of the variogram model so you need to do some experimenting. I have not used SurPac but note that it is advertised as a geology modeling and mine planning package, I suggest that you begin with an open source geostatistics software package such as "gstat" (R code) or sgems because those have more friendly and extensive options for variogram modeling. Note the comments about possible geometric anisotropies, i.e. you need to look at both unidirectional and directional sample variograms but the directional sample variograms will always have fewer half squared differnces per lag interval and hence often are hard to interpret.
You always need to do some exploratory analysis of the data both before computing sample variograms and possibly again afterwards, e.g compute a histogram of the data values, a coded plot of the data locations (coded by the data values), fit the data to a trend surface. You also need to ask what you know about the phenomenon that supposedly generated the data and how that information might relate to spatial correlation.
If you are using drilling cores where the data values are assays for cores then one of the best references is "Mining Geostatistics" 2nd ed, A.E. Journel and Ch Huijbregts, Academic Press. Most discussions of geostatistics and variogram modeling assume that the data are "point values", assays of cores are not point values and you have to use "regularized" variogram models.