There is no right number of pair of points. It will vary depenly of the variable that you are sampling, and probably the field conditions where the variable is. In the literature you will find a lot of guesses, but this will vary in each case.
There is a discussion on RG very similar to this question, please see the link below:
The distribution of the 20 points is very important over your 5.87ha.
If they are too closed (from their position, coordinates) then you can't use them.
Now the question is how closed is ''closed'', distant?
You can devide your 5.87ha in 10/10m, 25/25m, 50/50m or 100/100m plots . If at least one point (from the 20 points) fall into each plots then you can go ahead.
1. The phrase "perform geostatistics in kriging" is meaningless.
2. Do you want to use Ordinary kriging or Universal kriging, the statistical assumptions are different!
3. You must distinguish between estimating/fitting a variogram model and using that variogram model in kriging, these are two different problems.
4. In both problems (steps) it is not sufficient to know the minimum number of data locations, you must also know the spatial pattern of the data locations.
Usually to estimate/fit a variogram model you will use an empirical variogram so you must consider how the number and spatial pattern of the data locations affect the empirical variogram. See
1987, A. Warrick and D.E. Myers, Optimization of Sampling Locations for Variogram Calculations Water Resources Research 23, 496-500
1991, Myers,D.E., On Variogram Estimation in Proceedings of the First Inter. Conf. Stat. Comp., Cesme, Turkey, 30 Mar.-2 April 1987, Vol II, American Sciences Press, 261-281
1991, Myers,D.E.Interpolation and Estimation with Spatially Located Data Chemometrics and Intelligent Laboratory Systems 11, 209-228
1994, Myers,D.E., Spatial Interpolation: An Overview Geoderma 62, 17-28
When you then use the fitted variogram model in the kriging equations (Ordinary or Universal) the effect of the number of data locations and their spatial pattern is different. Moreover (with a moving search neighborhood) you won't want too many data locations in the neighborhood.
There is no one-size-fits-all criterion for this, however, some rules of thumbs were provided. In addition to the sample size, their spatial distribution pattern is also discussed. In general, the quality of the OK results will be improved with increasing data density and more regular spatial distribution.