You could use zero-inflated(altered) models. From your description, I would assume that the sick part cannot contain any zeros. In that case, you would use a zero-altered model (also two-part or hurdle model). You would then first model the healthy-sick with a binomial distribution and then model the sick, non-zero part by something like a truncated poisson or truncated negative binomial (for discrete/count data).
If the sick part can contain zeros as well, then a zero-inflated model is used. Here the healthy-sick is again modeled by a binomial distribution, but the sick part (which now can also contain zeros) is then modelled by a normal poisson or negative binomial.
If your sick part has a continuous response, you would of course have to replace the poisson or neg binomial distribution by a suitable continuous distribution, like the gamma distribution.
There are different packages in R to model such data. Also Bayesian techniques with MCMC are very useful when dealing with zero-inflated models.
Some good references include the books by Alain Zuur and Elena Ieno, "Mixed Effects Models and Extensions in Ecology with R" and "Zero Inflated Models and GLMM with R".
Many thanks. I will follow your advices. Yes indeed the heathy cohort is negative (32) with below detection limit response. The sick are in majority (85% out of 69) positive but 15% is below detection limit. I just would like the issue totally clear. Once again thank you for your help.
It is difficult to help because you did not give any details about the variable, a possible underlying model, and no clearly stated aim. So I can only guess...
Your data have no variation in one group (healthy): it is "degenerate", so there is no (standard?) tool to do any statistical analysis.
A work-around might be to add a positive value to the healthy group. This will increase the sample size by one and shift the expectation of a positive value from the undefined "no idea" to the defined "very unlikely".
Now you have different options to treat this data:
One could ask for the probability of observing a non-zero value in each group, and if these probabilities are different. This would be achieved by a binomial model.
One could ask for expected values and the expected difference between the groups. Her it will matter what kind of data you have. If these are counts, you should consider Poisson or a negative-binomial model, is these are concentrations or something similar, you should consider a generalized normal model with log-link.
Give you only want a p-value for the comparison of the proportions of non-zero results, you can use Fisher's exact test. Wilcoxon's test an also be used, but this will test the hypothesis of stochastic equivalence.
This is more to make it even clearer: This results of some biomarker screen which we believe is associated with one of the human cancer. The healthy people since they don't have cancer did not produce biomarker. In cancer people majority has different levels of this molecule, some of them have not. I hope it will help you to help me. I am simple biochemist never deal with such problems.
I don't think that ZI-models work on data where there are only 0's in one group. I would be grateful if you could tell me what model really works in this case.
Further, I think count models (poisson or NB) are inappropriate for concentration measures. Here, a gamma model would be great but this does not work with 0's at all. Or is there a gamma-like model that works when response values are 0?
@Adam:
Now I am confused...
It is a biomarker, you say. Further you say that you know that this marker is not present in healthy persons, and that you known that it is present in tumor patients.
If this is correct, then it makes no sense to compare healthy persons and tumor patients at all. The only research question that would remain I can think of is: what is a typical or expected concentration of the marker in tumor patients? Or: what is the expected lower limit of this maker in tumor patients?
The data from the healthy group is actually completely uninteresting.
The questions I noted can be answered best with a gamma-model = regression model of the gamma-family with log-link, in R: glm(Conc~1, family=Gamma(link="log")) .An alternative is the direct fit of the gamma distribution to the data. In R there is the function fitdistr() in tha package MASS that will do this: MASS::fitdistr(Conc,densfun = "gamma").
It was my initial thoughts that I have nothing to compare between healthy and tumor population. Definitely we did defined the limit of the biomarker assay. Unfortunately the most sophisticated software I am able to run is a GraphPad prism.
It really depends on your research question. If you want to be able to explain the presence or no presence of the biomarker as well as the concentration of the marker when it is present, a ZI model is appropriate. If you only want to model the concentration of the marker when it is present, as it seems is the case, then of course a ZI model is not relevant.
In the initial question, there was no mention of the type of response in the sick group, so I assumed it was discrete. Obviously, for a continuous response you need a continuous distribution, and I did mention the gamma as an example. It is true that a gamma distribution has a positive support and is not suitable for 0 values. However, if the assumption is that the marker is present in the tumor patient, then theoretically, a 0 concentration is not possible and all patients have a positive concentration. If the concentration is below the detection limit, I would assign those cases the value of the detection limit (or some other low/background value). Assigning 0 would be fundamentally wrong, as all tumor patients have the marker.