Hi,

I'd like to analyse Genetic-Variation-to-CpG correlations, i.e. methylation QTLs (meQTLs). Obviously the distribution of methylation (beta-value) of any one CpG-probe is not-normal and heavily skewed to the right or the left. On top of that the genetic data is imputed, thus formatted as "dosage" data. I wonder what the best, practical method is in terms of:

  • Speed, i.e. analyses of regions take tens of seconds, rather than hours.
  • Ease-of-use, i.e. virtually no re-formatting of data is needed.
  • Methodologically handling dosage data and skewed methylation data properly.

I've been working with:

  • SNPTEST v2.5.2 with the -raw_phenotypes flag - performance is so-so in terms of speed. Plus I don't know if the underlying statistical methods are the correct ones for these types of data - I'm a biologist and not a statistician.
  • fastQTL - this is super fast, but results are different from SNPTEST in terms of significance levels (factor 10 or 100 difference). Link: http://fastqtl.sourceforge.net
  • R - this is extremely slow and needs a lot of re-formatting before it can handle these data...

The current 1000G phase 3 dataset holds 40+ million well-imputable variants and the Methylation 450K array holds about 400k+ QC-passed CpGs. Thus the comparisons in any region will be many, on average 3,000-5,000 variants vs. 100-250 CpGs. And this would only be cis-acting methylation QTLs... In other words: your insight is highly appreciated! :-)

Thanks!

Sander

http://fastqtl.sourceforge.net

More Sander W. Van der Laan's questions See All
Similar questions and discussions