Exome Scan has arrived! Help!!!

Alzheimer's, cardiovascular, and other chronic diseases; biomarkers, lifestyle, supplements, drugs, and health care.
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Exome Scan has arrived! Help!!!

Post by J11 »

The 47 million 100% calls were from the imputation from the Kickstarter campaign. This is quite startling. I am now wondering how much additional data are contained in these SNPs. In particular how this might improve DNA family matching. Many of the more distant family matches turn out to identity by chance and not descent.

With almost 100 times more SNPs I wonder how much more accurate these genealogical matches could be.

I think the thing to keep in mind is that so much of the genome is not mutated. Even for supermutants! Imputation is able to fill in all these gaps.

I have already been able to use my imputation to correct the exome scan. Many of the most interesting variants from the exome scan were likely errors. I was very worried about some APP variants that seemed to be likely causes for our loved one's AD. The imputation clearly found that these SNPs were sequencing artifacts. It is simply an inherent property of sequencing that such errors would emerge. Double checking against an imputation is an inexpensive way of confirming the accuracy of an exome or genome scan. We were considering trying yet another exome scan to try and work through all the errors from the original scan. With this imputation that might not be necessary.
User avatar
Gilgamesh
Contributor
Contributor
Posts: 1711
Joined: Sat Oct 26, 2013 11:31 am
Location: Northeast US mostly
Contact:

Re: Exome Scan has arrived! Help!!!

Post by Gilgamesh »

I obviously have a lot to learn about imputation (in particular, I'm guessing, about meiosis, and how that affects how much and under what conditions imputation makes sense), since I'm inclined to say that an exome scan would, in some cases, be useful in order to correct imputation, not the other way around. This would of course depend on the DNA region involved, the "depth" of the exome scan, and so on.

Must hit the books! Thanks again for the inspiration to look into this.

GB
circular
Senior Contributor
Senior Contributor
Posts: 5565
Joined: Sun Nov 03, 2013 10:43 am

Re: Exome Scan has arrived! Help!!!

Post by circular »

Here's something new from Alzforum's e-news …

The whole article is interesting and no doubt the Nature paper, but here's twist:

"Catalog of Humanity—Variation in 2,500 Genomes Ready for Perusal"
The new structural-variants catalog can help scientists studying diseases inherited in families, just as SNP databases already do. Doctors who suspect a mutation, inversion, or deletion is to blame for an age-related neurodegenerative disease, for example, can check the 1000 Genomes data to find out if that variant commonly occurs. If it does, it might be innocuous, explain the authors. Without that context, a structural variation found in an afflicted family might be mistaken for the cause of the disease. The papers do not reveal the age or health of the people who donated the DNA samples.
http://www.alzforum.org/news/research-n ... 7-91740133
ApoE 3/4 > Thanks in advance for any responses made to my posts.
SNPedia
Contributor
Contributor
Posts: 15
Joined: Mon Oct 05, 2015 10:34 am
Contact:

Re: Exome Scan has arrived! Help!!!

Post by SNPedia »

If we believed the benefits of adding imputation to SNPedia or Promethease outweighed the drawbacks we would add it to one or both. But genomics is uncertain enough right now without deliberately adding in another layer of guesswork, regardless of whether it is free or $30.

Care for a very recent example, from a Nature Communications article published this week about variants associated with glioma risk? Of 14 imputed SNPs selected for follow-up study, 3 had to be dropped due to "poor concordance between imputed and sequenced genotypes". A 20% non-concordance rate is way too high a percentage for routine use in the absence of sequence data.

We too would love to see a robust way to generate lots more genotypic data, but it has to be highly accurate. While we wait for sequence costs to drop low enough, one alternative would be a "SNPedia chip", i.e. a custom microarray containing all of the SNPs in SNPedia. This would yield ~4X more coverage than the DTC chips in current use, at least in terms of accurately called SNPs mentioned in the medical (or genealogical) literature.
User avatar
Gilgamesh
Contributor
Contributor
Posts: 1711
Joined: Sat Oct 26, 2013 11:31 am
Location: Northeast US mostly
Contact:

Re: Exome Scan has arrived! Help!!!

Post by Gilgamesh »

SNPedia wrote:A 20% non-concordance rate is way too high a percentage for routine use in the absence of sequence data.
As I said, I'm a beginner here, but my sense of the usability of imputation squares with what you're saying.
SNPedia wrote:One alternative would be a "SNPedia chip", i.e. a custom microarray containing all of the SNPs in SNPedia. This would yield ~4X more coverage than the DTC chips in current use, at least in terms of accurately called SNPs mentioned in the medical (or genealogical) literature.
I, for one, would greatly support such an effort, and would certainly be willing to pay a pretty penny for ~4X more coverage than currently possible with DTC chips like the one 23andMe uses for its customers.

Crowd-fundable?

GB
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Exome Scan has arrived! Help!!!

Post by J11 »

I am not sure enough of the technical details of imputation to debate the point.

It would be very helpful if someone could provide a link to research that showed the accuracy of imputations using the 1000 Genomes dataset. I think that positive reports are now being noted of the improved accuracy using the UK 10K reference set. In comparison to the currently available sequencing capacity these are small scale reference sets. Current global sequencing capacity is already in the 100s of thousands of full genomes per year. Imputing off of these datasets would likely yield some very impressive results. It will be great to have the imputation software available for the public when the reference sets can report very accurate results. Perhaps the big challenge will be to create a Full Genome gene chip that could capture the maximum possible of the haplostructure of the human genome. It would be interesting to see how many SNPs were required to accomplish this goal.

Something that is not clear to me about the above noted article was the 3 dropped SNPs. They were dropped due to low concordance. The reported results that would be of interest to people would then have 100% SNPs that were of high concordance.

Most of the GWAS studies use imputation to find novel disease associations. For example, they would start out with 1 million SNPs and then might impute 7-10 million SNPs. Their entire methodology is based on imputation. Imputation would need to be quite accurate in order for them to pick up on the often weak signals present in some of the studies. It seems counter-intuitive to me that imputation would not be accurate for those wanting to determine their disease risk as found in the GWAS studies, when the GWAS themselves are based so heavily on imputation.

Yes, it would be great to have special purpose chips. I am sure that there would be a considerable market for a $50 gene chip that would include all the variants of interest for Alzheimer's or other such complex illnesses. Even with present knowledge such a chip could be helpful. The research Psych chip might also have a wider mainstream market than the more focused research application now being pursued. I am in such products, though it has been difficult finding a company willing to provide such a service. Gene chips seem to be a low end commodity product that many genomics companies have avoided.

I am still quite pleased that I now have the imputation file. It is going to be very interesting to compare the 50 million SNPs in the imputation file against the 60 million base pairs that were sequenced in the exome. Imputation would seem to have a useful application in verifying the suspect calls in the exome file. Sequencing has particular difficulty with long strings of identical nucleotides. Sequencers have trouble distinguishing between 20 consecutive Cs and 21 Cs. Having the sequencing centers clarify what actual variation is found at such sites could greatly reduce the confusion that is caused for exome for full genome customers.
Lamilla
Contributor
Contributor
Posts: 14
Joined: Wed Oct 30, 2013 8:43 pm

Re: Exome Scan has arrived! Help!!!

Post by Lamilla »

J11 wrote:
I have already been able to use my imputation to correct the exome scan. Many of the most interesting variants from the exome scan were likely errors. I was very worried about some APP variants that seemed to be likely causes for our loved one's AD. The imputation clearly found that these SNPs were sequencing artifacts.
You mentioned earlier in the thread that the exome DNA sequence had 65x coverage. This is clinical grade sequencing which is the gold standard. It can be used as the reference for all other DNA data, eg. microarray SNPs from 23andMe and any imputation data. In other words, your exome sequence is correct.

If you see any differences between your exome sequence and imputation data, then the imputation data is incorrect. This will give you an idea of how accurate the imputation data is for your particular genome. You mentioned some differences, so it indicates that the imputation data has some obvious errors.
ApoE 3/4, MTHFR C677T (+/+)
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Exome Scan has arrived! Help!!!

Post by J11 »

Yes, that is true about it being a clinical grade exome scan. This is why I kept giving them more money; I wanted to move toward a true clinical grade result. I did not want to be stuck with lots and lots of sequencing artifacts. I probably should have moved it up to 75x coverage. 65x is only an average and when you go through the reads carefully you can sometimes find examples where the exome sequence call seems clearly wrong. One perfect example was on a mitochondrial call. For some reason the aligner did not shift over the base pairs when a deletion had been called elsewhere. Shifting the other base pairs in this way would have allowed many nts to fall into position properly and would have been the most parsimonious alignment.

Another example (below) is one of the most scary string of variants in the whole exome scan. I thought for a while that this was the cause
of our family's dementia. It is the APP gene, one of the main drivers of early onset AD. Mutation Taster called it disease causing. It is true that none of these variants actually passed QC, though the depth was fairly impressive (though strand bias existed). The imputation made the right call for the rs below. There is somewhat of an art involved in understanding an exome sequence. This region had a whole bunch of troubles on one read direction. Almost no errors existed on the opposite direction reads. The problem on this stretch was that there was 19 consecutive C calls made. Sequencers have quite a bit of trouble with so much repetition. My rule on such circumstances would be to tend towards accepting the imputed result over the sequenced one. The quality control for such regions at expert sequencing centers for such repetitions likely exceeds the qc on my exome on these regions.

chr21 27253243 . A AC 148.75 SB BaseQRankSum=5.509;DP=66;FS=61.949;HRun=4;HaplotypeScore=309.1533;MQ=50;MQ0=0;MQRankSum=-1.935;QD=2.25;ReadPosRankSum=0.147;SB=-0.00;CSQT=APP|NM_000484.3|3_prime_UTR_variant:feature_elongation GT:AD:DP:GQ:PL:MQ:GQX:VF 0/1:42,20:66:99:188,0,925:50:99:0.323
chr21 27253248 . T C 366.66 SB BaseQRankSum=-2.783;DP=62;Dels=0.00;FS=35.105;HRun=4;HaplotypeScore=27.5966;MQ=51;MQ0=0;MQRankSum=-3.698;QD=5.91;ReadPosRankSum=0.701;SB=-0.01;CSQT=APP|NM_000484.3|3_prime_UTR_variant GT:AD:DP:GQ:PL:MQ:GQX:VF 0/1:37,24:62:99:397,0,871:51:99:0.393
chr21 27253253 . A C 250.48 SB BaseQRankSum=-2.873;DP=63;Dels=0.00;FS=11.275;HRun=4;HaplotypeScore=30.8762;MQ=51;MQ0=0;MQRankSum=-2.494;QD=3.98;ReadPosRankSum=1.665;SB=-0.01;CSQT=APP|NM_000484.3|3_prime_UTR_variant GT:AD:DP:GQ:PL:MQ:GQX:VF 0/1:38,25:63:99:280,0,781:51:99:0.397
chr21 27253257 rs146774213 G C 130.22 SB BaseQRankSum=-4.350;DP=56;Dels=0.00;FS=25.915;HRun=4;HaplotypeScore=12.8918;MQ=51;MQ0=0;MQRankSum=-2.572;QD=2.33;ReadPosRankSum=-1.219;SB=-0.01;AA=G;GMAF=A|0.0014;CSQT=APP|NM_000484.3|3_prime_UTR_variant GT:AD:DP:GQ:PL:MQ:GQX:VF 0/1:35,21:56:99:160,0,977:51:99:0.375

So here even at a depth of 56 and PHRED of 130 the sequencer was leaning the wrong way. On a GMAF of 0.0014, I would go with the imputed result of (reference homozygous). No idea why the alleles listed are G and C while the MAF is listed as A.

I definitely think that once there are a very large database of full genome and exome sequences (which is actually all ready true now!, (though not publicly available)) that imputation would be such a helpful tool to make corrections in exome sequences. Just think if there were 100 reads on 1 million individuals of a SNP and they were all reference. It would seem almost impossible to imagine that someone would then pull off a variant call especially with a messy region as noted above. Of note is that three of the above calls were not even in the dbsnp database! There were quite a few other such calls in the exome file, though some of them seemed to be true calls. It will be great when dbsnp finally has a definitive list of the SNP variation in the human population. High quality imputation might then have truly arrived.



One thing that I found somewhat concerning was that it was noted that many SNPs can be called with near 100% certainty as being reference homozygous simply because the variant has a very low frequency (perhaps 1 in a thousand or less). So there is only 1 chance in a million that it would call homozygous variant and only somewhere near 1 in a thousand as heterozygous. No imputation at all would be needed to make such a call. This would only be guessing and not really imputing. You could give someone a file with possibly millions of such "imputations", though this would be nearly meaningless.

I would like to update everyone on my progress with the variant of interest voyage.
I have been able to get one of the difficult relatives to agree to provide a saliva sample. This is great.
We have already found a large number of relatives online with the saliva of our loved one. This new relative will
greatly increase the number of available matches. This could be especially helpful as we will try and
find some rare matches on the mitochondrial or perhaps the Y lines to have a good starting point.
If we could find a near match on the mitochondrial chromosome then we could start sorting out paternal and maternal matches.

Hopefully we can also receive some cooperation with this from our family who have family genealogy knowledge.

I regret now that I did not listen to advice given to me by members of an expert forum who suggested that I go with Ancestry.com.
I did not realize that Ancestry.com performs phasing as part of its service. For some reason the other services do not appear to do this.
(23andme might do this, though I am not sure). Knowing whether a relative is paternal or maternal ( or at least if we could separate them into two unlabeled bins would make the job a lot easier.

If we can convince other relatives to genotype, then we will probably go with Ancestry.com. As it is the gene chipper where the sample will be sent to has a massive database that will greatly help us.
Post Reply