Can anyone point me to a way to test for the normality of censored data in R? I have heard of the Crame'r-von Mises statistic but the paper is behind a paywall.
The site uses the package CvM2SL2Test. However, when I went to install it would not install. I do not know if I made some mistake, or if the package is not available. I did not see a simple fix.
If this is a prelude to other analyses, then I would ask a few questions.
If you fail to reject the null hypothesis is it appropriate to conclude that the null hypothesis is true?
Maybe: if you fail to reject the null hypothesis is it safe to assume that the robustness of the statistical method that assumes normality is sufficient to overcome any minor departure from normality that is present but undetectable with the existing sample size?
Can you do it graphically? Yes, the approach is crude and you don't get p-values. However if you plot a histogram of the data over a normal curve does the fit look reasonable? Maybe try a Q-Q plot? If there are too few data points to make the determination, would you trust the results of any test statistic?
I want to show that normal distributions are reasonable approximations to rodent longevities. Accordingly, both the design and analysis of experiments can be done much more efficiently when we do not have to use nonparametric methods such as the Cox model. (Yes, I know that technically Cox is "semiparametric".)
I think the key to all of this is to find a quantifiable value that defines "reasonable approximation" and the scientific justification for that choice. Once that is done, the rest should be relatively easy. Where do the departures from normality no longer bias the statistical results sufficient to have economic, political, social, legal consequence?
At this point, my reaction is to suggest that the question is unanswerable, especially with the application of a single statistic no matter how sophisticated. Sometimes it is good to tackle what seem to be unsolvable problems.
It is not clear to me what you are doing, and why you need a distribution. At any rate, "normality" is usually not the "norm." Here you say "I want to show that normal distributions are reasonable approximations to rodent longevities." But longevity sounds like a kind of reliability problem to me, so if you are doing something that requires a distribution, don't you want the Weibull? Normality does not seem the case, and if you are "censoring," I expect that means you are cutting off the tail of your distribution, which means you should not expect any kind of classic fit, unless you cut very little.
Or, by "censored data," do you just mean the end of the period of longevity?
Anyway, it sounds like a reliability problem, just like estimating the life of a light bulb.
Best wishes - Jim
PS - Beware of p-values. They do not stand alone. A type II error analysis, or similar, to account for effect is needed. Please see below:
Regarding misused p-values:
Press release for the American Statistical Association:
I'm not testing for "reliability" in an industrial sense. I'm comparing longevities of different treatment groups. If the normal distribution is a better fit to reality and the mathematics are a lot simpler, why use Weibull?
If you are comparing two groups, you could get a confidence interval about the difference in their means. With a large enough sample size, normal is ok for standard errors of means, for use in confidence intervals. If the population standard deviation is anywhere close to "normal," then the sample size does not have to be so large to look very "normal," for the distribution of that statistic (i.e., the mean). (You might look at Chebyshev as a worst case.) The central limit theorem helps with means, if I understand your problem.
A confidence interval for your difference in means, like a p-value, is sample size dependent, but more practically interpretable. If you have good "confidence" in an interval that does not include zero, then this gives you an idea as to how different those means might be (an effect size).
If Dr. Robertson has already measured different longevities for different treatments, then it is not necessary to use normal distributions. If well measured data is censored, then analysis is "abnormal" because it effects the media of longevity of the particular group. Is it necessary to measure probabilities with a model? In that case each treatment will have its own media and its own distribution -clearly different from normal ones-.
Normal distributions theory is a wrong theory and practice in my opinion. Sometime ago, I gave you an alternative that only requires mínimum longevity, maximum longevity and media of longevity (for each treatment). It does not use any dispersion parameter, only media. Of course it is only a proxy model that fits perfect extreme values and media using data plus media of data.
Why does normal distributions have so many followers but so many wrong results?
Is Dr. Chaves familiar with survival analysis? In real life, most experiments will have a portion of longevity data lost to follow-up (right censored). There may also be right truncation due to time limits; most grants will only fund for two years, while rodents live longer. There will also typically be left truncation, since the rodents will be a few weeks old when the experiment begins. Parametric survival analysis incorporates the censored and truncated data and fits them to known distributions; some distributions will fit better than others. Semiparametric models such as Cox will lose power because it does not try to fit the hazard function into curves, when in fact the data may fit curves. The assumption that the data cannot fit a known distribution is potentially a strong one.
Hi Henry Roberton You can read the article online through jstor http://www.jstor.org/stable/2335622. Also please see Testing for Normality of Censored Data - DiVA by J Anderson available at https://www.diva-portal.org/smash/get/diva2:816450/FULLTEXT01.pdf