I am trying to calculate the frequency of every pathogenic germline variant in every disease cohort. For ex: for variant 1:17588689, there are 20 het.variants (column H), I need to report what percentage of these samples are in glioma cohort, what percent in meningioma cohort..etc. I should append that information for each disease cohort, i.e. (meningioma, glioma, schwannoma, pituitary adenoma, others) at the end of the last column. So, also have to make sure the sum of the percentages add upto 1. The heterozygous sampleIDs are listed in column-AZ, titled "HetSamples" and IDs are seperated by comma in my dataset. I am stuck at some point so I would appreciate if someone can assist me to complete it.

```

library(stringr)

library(tidyr)

library(dplyr)

```

```

diseaseData

More Hasan Alanya's questions See All
Similar questions and discussions