It is not that uncommon to get a negative variance in the estimation of higher-level variance in a mixed or multilevel model. Our MLwiN software even has an option headed 'allow negative variances'. And indeed there is a current open question on Rgate about this happening with SAS.
The algorithms used in these models do not 'know' that a variance is being estimated, it is simply a parameter and negative values can and do occur ( as can correlations between random intercepts and slopes outside the range of -1 to +1).
This can be a result of model mis-specification, severe imbalance, and/or lack of power to detect effects (the CI's will straddle 0) but there may also sometimes be something more interesting.
The intra class correlation (rho) is estimated by the level 2 variance divided by (level 2 variance + level;1 variance ) in simple models. This has two interpretations
1 the proportion of the variance at the higher level
2 the degree of dependence or correlation between pairs of lower level units in a higher level unit.
EG typically some 10% of the variations between pupils in attainment lies at the classroom level; so pick many pairs of children from the same classroom and the calculated correlation will be 0.1
And of course it is is possible to have negative dependence and that would require negative 'variance' at the higher level. The classic example is a litter of pigs which are under competition for food , some may grow at the expense of others; they are more un-alike than a random sample. And as a geographer I am very used to negative (spatial) auto-correlation eg the alternating black/ white pattern of the rook's weighting scheme on a chessboard,
I can vividly remember a discussion with Sir David Cox in the early 1990's where he argued that software should show estimates of negative variance and not hide it from users- and this is now what we do in MLwiN at the higher levels.
What does "Estimated G matrix is not positive definite" mean? - ResearchGate. Available from: https://www.researchgate.net/post/What_does_Estimated_G_matrix_is_not_positive_definite_mean [accessed Jan 8, 2016].
It is possible to get an "adjusted R-sq" that is negative if your explained variance is zero or near zero and use a large number of degrees of freedom to produce that outcome.
Otherwise, as Jochen says, the formulas for explained variance do not allow it to be negative.
It may be some estimation problem. It happens quite often in design of experiments modeling when we estimate variance components. It is customary to set such negative estimates to zero, which then induces some bias into the results.
I will look up an example in my lecture notes, Jochen. It happens when the Mean Square Treatments < Mean Square Error. Both sources will be estimating random error, and by chance, MSTRT < MSE. The estimated variance components will have in the numerator some function of [MSTRT-MSE]. It is data driven, so it happens with some data sets.
As Raid noted "It may be some estimation problem." In survey statistics for continuous data for inference for finite populations, one standard for determining a good variance estimator is one that does not produce (many) negative estimates. Consider the Horvitz-Thompson estimator for population totals. (Not a regression model, but perhaps one could consider it a kind of statistical model.) On page 261 of Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons, Cochran states that both what I'll call the standard "unbiased sample estimator" for variance of the estimate of a population total, and also a well-known one by Yates and Grundy (and Sen) "...can assume negative values for some sample selection methods."
It is not that unusual to have to watch for negative values in finite population inference, in estimates, where results ideally should never be negative.
An important general example that comes to mind is when using calibrated weights, which are modifications of probability-of-selection-based survey sampling weights, which are adjusted to known totals for auxiliary data in model-assisted design-based methods. Calibration weights can also include the regression weights due to heteroscedasticity. Calibration weights may often need further adjustments for problems, including negative results for some of these estimated weights.
Perhaps calibration weights in survey statistics might be pushing the limits of the definition of a statistical model for some people, and not a variance estimator, but I think it a good example of estimation that can yield some odd results. However, the ultimate results for inference for finite populations using calibration weights are virtually always greatly improved over the use of survey weights based only on probability of selection, and these calibration weights do work into variance estimates too, actually. So, come to think of it, that is part of variance estimation.
It is not that uncommon to get a negative variance in the estimation of higher-level variance in a mixed or multilevel model. Our MLwiN software even has an option headed 'allow negative variances'. And indeed there is a current open question on Rgate about this happening with SAS.
The algorithms used in these models do not 'know' that a variance is being estimated, it is simply a parameter and negative values can and do occur ( as can correlations between random intercepts and slopes outside the range of -1 to +1).
This can be a result of model mis-specification, severe imbalance, and/or lack of power to detect effects (the CI's will straddle 0) but there may also sometimes be something more interesting.
The intra class correlation (rho) is estimated by the level 2 variance divided by (level 2 variance + level;1 variance ) in simple models. This has two interpretations
1 the proportion of the variance at the higher level
2 the degree of dependence or correlation between pairs of lower level units in a higher level unit.
EG typically some 10% of the variations between pupils in attainment lies at the classroom level; so pick many pairs of children from the same classroom and the calculated correlation will be 0.1
And of course it is is possible to have negative dependence and that would require negative 'variance' at the higher level. The classic example is a litter of pigs which are under competition for food , some may grow at the expense of others; they are more un-alike than a random sample. And as a geographer I am very used to negative (spatial) auto-correlation eg the alternating black/ white pattern of the rook's weighting scheme on a chessboard,
I can vividly remember a discussion with Sir David Cox in the early 1990's where he argued that software should show estimates of negative variance and not hide it from users- and this is now what we do in MLwiN at the higher levels.
What does "Estimated G matrix is not positive definite" mean? - ResearchGate. Available from: https://www.researchgate.net/post/What_does_Estimated_G_matrix_is_not_positive_definite_mean [accessed Jan 8, 2016].
Statisticians like David Cox or George Box or Brad Efron or I J Good are like legends. It is useful to learn from such individuals. Many years ago, I gave a talk about my Ph.D. results in process control in Florence at a statistics conference, Professor Box was chairing my session. I may have been very nervous. He told me after my talk, "Young man, your are doing well. Such applications are very useful.: ... or something like that, and it encouraged me to do more research in that research area. They have the wisdom! I should have also listened to IJ Good after we met for one hour about my Ph.D. dissertation! Of course, he suggested a Bayesian approach to what I was then trying to do.
If you apply the error propagation law to a linear transformation Y = FX where X is a random vector and F a matrix of constants (say, the 1-by-2 vector F = [1 1]), and where the variance-covariance matrix C_XX of X is not positive definite (since it has e.g. strong negative covariances as in C_XX = [0.9 -0.95; -0.95 0.9]), then the variance of the propagated random vector Y may be negative (C_YY = F*C_XX*F' = -0.1 with F' being the transpose of F). This may happen in geostatistics when the covariance matrix of observables X is determined empirically and not approximated by a positive definite mathematical covariance function.