In our paper, we show that the presence of polymorphisms (SNPs and indels) in probe sequences can induce a serious technical artefact (through reduced hybridization efficiency) when conducting expression QTL (eQTL) studies, especially if you are using array based technologies. You can find the paper here: http://nar.oxfordjournals.org/content/early/2013/02/21/nar.gkt069
But for all of you busy people, the main result is summarized in Figure 1 (also attached below).
Quick summary:
- About 6.1% of the probes (25-mers) in Affymetrix Human Exon 1.0 array contains polymorphisms but they account 50 – 90% of cis-eQTLs
- About 11.7% of the probes (50-mers) in Illumina HT12 array contains polymorphisms but they account for 30 – 45% of the cis-eQTLs
- The binding efficiency of longer probes are less affected by polymorphism (as expected) but still > 30% of cis-eQTLs are false!
- The 1000G dataset appears to be good enough to identify probes containing polymorphisms. The only way to possibly improve on this is to look at private mutations from exome-seq etc but may not be worth the effort.
- Increasing p-value stringency seems to make situation worse (cis-eQTLs due to polymorphisms are very strong) so many published cis-eQTLs may suffer from this (especially if the authors have not done this or used an older reference panel to identify polymorphisms)