Any suggestions on how to deal with these two sets of data: One healthy with only zero values and a second set of sick with broad distribution?

More Adam Lesner's questions See All

In search of peptide sequence in protein

I am looking for suggestions how to identify the protein/peptide with N-terminal Met-Xaa-Xaa-Arg sequence. thank you in advance

10 November 2016 366 7 View

Met-Xaa-Xaa-Arg sequence?

I am looking for tool or suggestions how to identified the protein/peptide with N-terminal Met-Xaa-Xaa-Arg sequence. Since I am not familiar with all such software tools please be precise:) thank...

03 April 2016 4,304 4 View

Can anyone help with proteasomal β5 subunit isolation ?

I am in a fix... One of the reviewer asked us to prove that disassociated subunit (namely beta 5) of human 20S proteasome is catalytic active. Any suggestion will be helpful. All that comes to my...

08 September 2015 9,278 5 View

Can anyone recommend a software tool that will create a mass spectra of the theoretical library of peptides/peptidomimetic?

Morning everyone l am looking for software tool that will create a mass spectra of the theoretical library of peptides/peptidomemtic. Any suggestions advice. We need to verify the experimental...

06 July 2015 2,677 2 View

Would anyone like to collaborate in the synthesis of the amino acids shown below?

Dear All I am looking for cooperation in synthesis of some amino acid derivatives that structures are attached to this question. Please write me if anyone will be interesting. Thank you

04 May 2015 8,851 6 View

Does anyone has an idea how small particles (subcell dimension) can be separate on cell sorter? Is it for example range of 100 nm or 10 nm?

Looking for method that allowed to separate organelles that are between 50 nm and 5 nm. Any suggestions?

11 December 2013 4,419 4 View

Can anyone recommended a protocol to lyse the cell and later on follow the proteinase activity?

Proteinase activity and whole lysate protocol question. Does anyone What about detergent and reducing agents?

08 September 2013 8,998 4 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Rianne Jacobs

You could use zero-inflated(altered) models. From your description, I would assume that the sick part cannot contain any zeros. In that case, you would use a zero-altered model (also two-part or hurdle model). You would then first model the healthy-sick with a binomial distribution and then model the sick, non-zero part by something like a truncated poisson or truncated negative binomial (for discrete/count data).

If the sick part can contain zeros as well, then a zero-inflated model is used. Here the healthy-sick is again modeled by a binomial distribution, but the sick part (which now can also contain zeros) is then modelled by a normal poisson or negative binomial.

If your sick part has a continuous response, you would of course have to replace the poisson or neg binomial distribution by a suitable continuous distribution, like the gamma distribution.

There are different packages in R to model such data. Also Bayesian techniques with MCMC are very useful when dealing with zero-inflated models.

Some good references include the books by Alain Zuur and Elena Ieno, "Mixed Effects Models and Extensions in Ecology with R" and "Zero Inflated Models and GLMM with R".

Adam Lesner

Many thanks. I will follow your advices. Yes indeed the heathy cohort is negative (32) with below detection limit response. The sick are in majority (85% out of 69) positive but 15% is below detection limit. I just would like the issue totally clear. Once again thank you for your help.

Jochen Wilhelm

It is difficult to help because you did not give any details about the variable, a possible underlying model, and no clearly stated aim. So I can only guess...

Your data have no variation in one group (healthy): it is "degenerate", so there is no (standard?) tool to do any statistical analysis.

A work-around might be to add a positive value to the healthy group. This will increase the sample size by one and shift the expectation of a positive value from the undefined "no idea" to the defined "very unlikely".

Now you have different options to treat this data:

One could ask for the probability of observing a non-zero value in each group, and if these probabilities are different. This would be achieved by a binomial model.

One could ask for expected values and the expected difference between the groups. Her it will matter what kind of data you have. If these are counts, you should consider Poisson or a negative-binomial model, is these are concentrations or something similar, you should consider a generalized normal model with log-link.

Give you only want a p-value for the comparison of the proportions of non-zero results, you can use Fisher's exact test. Wilcoxon's test an also be used, but this will test the hypothesis of stochastic equivalence.

This is more to make it even clearer: This results of some biomarker screen which we believe is associated with one of the human cancer. The healthy people since they don't have cancer did not produce biomarker. In cancer people majority has different levels of this molecule, some of them have not. I hope it will help you to help me. I am simple biochemist never deal with such problems.

and of course it is concentration of this molecule

@Rianne:

I don't think that ZI-models work on data where there are only 0's in one group. I would be grateful if you could tell me what model really works in this case.

Further, I think count models (poisson or NB) are inappropriate for concentration measures. Here, a gamma model would be great but this does not work with 0's at all. Or is there a gamma-like model that works when response values are 0?

@Adam:

Now I am confused...

It is a biomarker, you say. Further you say that you know that this marker is not present in healthy persons, and that you known that it is present in tumor patients.

If this is correct, then it makes no sense to compare healthy persons and tumor patients at all. The only research question that would remain I can think of is: what is a typical or expected concentration of the marker in tumor patients? Or: what is the expected lower limit of this maker in tumor patients?

The data from the healthy group is actually completely uninteresting.

The questions I noted can be answered best with a gamma-model = regression model of the gamma-family with log-link, in R: glm(Conc~1, family=Gamma(link="log")) .An alternative is the direct fit of the gamma distribution to the data. In R there is the function fitdistr() in tha package MASS that will do this: MASS::fitdistr(Conc,densfun = "gamma").

Jochen

Thank you

It was my initial thoughts that I have nothing to compare between healthy and tumor population. Definitely we did defined the limit of the biomarker assay. Unfortunately the most sophisticated software I am able to run is a GraphPad prism.

Jochen,

It really depends on your research question. If you want to be able to explain the presence or no presence of the biomarker as well as the concentration of the marker when it is present, a ZI model is appropriate. If you only want to model the concentration of the marker when it is present, as it seems is the case, then of course a ZI model is not relevant.

In the initial question, there was no mention of the type of response in the sick group, so I assumed it was discrete. Obviously, for a continuous response you need a continuous distribution, and I did mention the gamma as an example. It is true that a gamma distribution has a positive support and is not suitable for 0 values. However, if the assumption is that the marker is present in the tumor patient, then theoretically, a 0 concentration is not possible and all patients have a positive concentration. If the concentration is below the detection limit, I would assign those cases the value of the detection limit (or some other low/background value). Assigning 0 would be fundamentally wrong, as all tumor patients have the marker.