I am trying to analyse mutation data for endometrial cancer obtained from different studies within several databases (COSMIC, cBioportal, Intogen). I have collated the data and grouped the mutations by gene. The focus of the analysis are non-synonymous coding mutations - because these mutations are most likely to cause a change in the normal protein function.
The aim of the study is to understand the mutational landscape of Endometrial cancer. The main objectives of the study are to find the commonly mutated genes in endometrial cancer, to find significantly damaging gene mutations in endometrial cancer and to create an updated list of genes comparable to commercial gene panels.
I have created this table with the collated data:
The idea here is to use mutation burden to imply damaging effects of the genes' mutations in endometrial cancer. We then created a composite score to use as a comparable figure between the genes.
At the moment, our list of genes is at 16,000+. We are currently trying to think of a way to narrow down the list of genes to only focus on those significantly mutated compared to the other genes by way of statistics. Any advice is greatly appreciated.