Three of the models you are using have both a sill and a range (in the case of the gaussian and exponential it is only a practical range) whereas the linear model has neither.
In looking at the graph of the experimental variogram does it appear to have a growth rate of 2 or greater? There are no valid variogram models to fit this and this characteristic indicates that the mean is not constant, i.e. a critical statistical assumption is not satisfied.
Why are you choosing nfold =5, that makes interpreting the cross validation statistics more difficult.or invalid. With nfold =5 you are partitioning the data set and the code may be using a random number generator to do that
nmax=100 is very large, reduce it to less than 25, the kriging matrix will be very large when nmax is so large and that will likely cause numerical instability when inverting the matrix. The linear model is only conditionally negative definite but the other three correspond to covariance models which are positive definite.
It is not uncommon when writing software for kriging to "fake" a sill for a linear model and write the code as though it corresponded to a covariance model, the kriging matrix is easier to invert when the kriging equations are given in terms of a covariance function than when given in terms of a variogram.
You have not set a max distance for the search neighborhood and with 100 data points the implicit max distance could be very large and also not constant (depending on the spatial arrangement of the data locations)
In the case of the variogram models with a sill and a range, data locations that are farther away than the range will have little impact on the kriging weights but in the case of the linear model this is not the case.
As for negative data values, those will not have any effect on the cross validation statistics (using nfold -1). However there is a big difference between negative data values and negative kriging weights. In ordinary kriging the weights must add to one, hence if there are any negative weights then it is likely there are weights greater than one. Too large a value for nmax and/or too large a search neighborhood can easily result in negative kriging weights