Genetic genealogy raw data and Alzheimer's research ...

Alzheimer's, cardiovascular, and other chronic diseases; biomarkers, lifestyle, supplements, drugs, and health care.
Post Reply
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by J11 »

I thought it would be interesting to demonstrate my claim of near saturation coverage with relatives of our loved one's genome.
So, I went through the 1500 matches on GEDmatch to see how saturated the coverage actually was on chromosome 3.
(Noted earlier was an interesting variant on chromosome 3 near 174M in the NLGN gene.)

After going through the matches, I indeed found an impressive level of coverage. Using only GEDmatches of 7cM or longer segments
approximately 70% of the chromosome had at least one associated relative. After including high quality matches on 23andme several of the
gaps were filled in increasing coverage to 80%. One segment (from 65M-165M) had nearly continuous coverage over 100 Megabase pairs!
For a large portion of chromosome 3 there was at least one relative (paternal or maternal) of our loved one. Disappointingly the 174M region did not have a high quality match, though there were several 5-6 cM matches. Often these smaller matches will be false matches. Phasing apparently is able to weed some of these out. Phasing would be helpful for me, though I am not sure how to go about this. Impute2 has a phasing function and Ancestry.com supposedly does phasing for its customers. Would be great if DNA Land were to do phasing.

It does, in fact, appear that near saturation genome coverage has been reached. It will be interesting to see how much more coverage can be obtained when we receive the results form a sample that is being sent to FamilyDNAtree.com.
ApropoE4
Contributor
Contributor
Posts: 396
Joined: Sun Feb 02, 2014 10:43 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by ApropoE4 »

I'm still at all not sure what you mean. Can you explain, without using percentages, site names, etc. what you think can be achieved and how.

For example, my gedmatch profile shows 1500 matches with generation matches between 3.6 and 4.7 (these are rubbish because of my ancestry, but let's assume they're true). What do I know now, or what could I know, that 23andMe doesn't already know?
sarahb12
Senior Contributor
Senior Contributor
Posts: 196
Joined: Mon Nov 04, 2013 8:21 pm
Location: Boise, id

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by sarahb12 »

I think the fact that people are wanting to be citizen scientist is exciting. And especially out of frustration that those that are empowered to advance medicine often have conflicts of interest in making $ and simply don't.

Are you aware of DIYgenomics? People are starting their own studies.

Sarah
E3/E4
ApropoE4
Contributor
Contributor
Posts: 396
Joined: Sun Feb 02, 2014 10:43 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by ApropoE4 »

I'm sure that's true, I'm just having trouble following J11's logic.
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by J11 »

Well yes, there is a certain amount of big hat no cattle going on here. I'll just see where this leads me.

OK. For example, I recently found a very interesting variant in the NLGN gene around position 174M on chromosome 3. This SNP was reported as disease causing by Mutation Taster and is super rare. There is some research supporting a link between the NLGN gene and AD. This is pretty thin evidence, though I can sketch out my new strategy.

Whenever, these new AD risk variants now come to my attention, I can immediately go to GEDmatch and find those with a matching segment. If there had been a matching segment for the 174M segment then I could have immediately contacted them and asked if there had been dementia problems in the family. If there had been, then I would have narrowed down the search space to only 7cM or less!
Such a small search region might allow for the disease causing mutant in the exome file to be found.

None of the current genomics providers have anything to say about family level disease risk. The big money that has been spent on GWAS studies also is silent about family inheritance. To a large extent none of these GWAS have in any way clarified familial AD. In fact 23andme lists our loved one, who is epsilon 33, as having a substantially reduced risk of AD. Most familial AD is a mystery.


I think there are a whole bunch of people having this "Ah ha" moment when they see that doing family studies with near to distant relatives just makes so much sense. I went through the matches on GEDmatch and it was amazing. There were all sorts of relatives with matching segments of 10, 20, or even more than 30 Megabasepairs! If we have a dominant AD mutation somewhere in our family tree it does not appear that it is going to be overwhelmingly difficult to find it.

Once this becomes more obvious to people, there will be a certain gene hunting mania. Platforms such as DNA Land could make this whole process so much easier. With the right software, and overall set up the sought after variants will almost naturally be revealed.

This is all very interesting and whoever can setup the variant discovery assembly line could be rewarded with considerable recognition.
Many researchers have slogged away at the family level trying to work out inheritance patterns. It can be tough sledding. If the entire infrastructure is setup online so that people have the tools they need to work this out efficiently there could be simply an ocean of genetic discoveries.
ApropoE4
Contributor
Contributor
Posts: 396
Joined: Sun Feb 02, 2014 10:43 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by ApropoE4 »

But why do you think 23andMe isn't doing this already?
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by J11 »

It is interesting that even with all the power behind the 23andme brand, people at street level still have a chance to accomplish things that they can't. It is a classic last mile problem. They have access to the cloud, a highly qualified workforce, a massive genetics database etc. and yet they don't have deep family knowledge that it is not well expressed with simple survey questions.

And of course, it is the old "There are no Martians argument". If 23andme had solved my family's genetics of dementia problem, then we would probably have heard about it through Promethease or elsewhere. At this time it seems fairly likely that the reason why we still do not know what is causing our family's genetics is that no one else knows either (We have not seen any Martians because there aren't any.)

I am sure 23andme is trying to do some of this research, it is just that there are a few hurdles in their way.

For example, the 23andme health survey questions did not even ask about dementing illness. It was very surprising. Neurodegenerative illness is one of the few illness categories that seemed conspicuously absent from their surveys. Also, running the necessary exome scans would be quite expensive. We are going to test another family member at FamilyTreeDNA.com. 23andme will probably not have access to these results, nor will they have any idea what phenotype this family member has demonstrated nor will they have access to our family's genealogical knowledge nor our intuition on which family branch would be likely to house the variant of interest.

This is a tremendously exciting moment in Medical Genealogy. It feels like right now all the remaining questions of family inheritance that have seemed intractable with the methods used to this point can now be answered. This has only emerged over the last year or so. We are now receiving emails from 23andme announcing new relatives at the rate of more than 1 per day. Mainstream acceptance of genetic testing has now been achieved. Once saturation coverage of the genome has been reached (as I have noted previously has largely been achieved with our loved one) then resolving these genetic questions will be easy, perhaps even automatic.

This is why DNA Land and probably others are now setting up the infrastructure needed to finally unlock the genome at the family level.
They are going to be able push this over the line for a whole bunch of disease categories and receive a large amount of glory in the process.
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by J11 »

After working through the 23andme relatives spread sheet, it appears that over 800 unique Megabase pairs of our loved ones genome is shared among the top 100 base pair matches. This is fairly encouraging. Even just throwing the dart on the wall gives us fairly good odds of finding what we are looking for. There are a few labeled as anonymous which is somewhat concerning as they might not be interested in helping out (indeed it might not even be possible to contact those labeled as anonymous). Fortunately, most of those noted as anonymous have proxies that could be used to report on these regions. It is unfortunate that 23andme does offer better online tools. Life would be much easier in our search if they were to.

I have also been investigating the matches on GEDmatch. Today I upgraded to Tier 1 and I am feeling very high class. Yes siree, I am now in the Tier 1 class! The segment matching tool in Tier 1 will make things quite a bit easier. Also the triangulation tool is pretty helpful. Upgrading to Tier 1 is obviously worth avoiding having to work through all the triangulations manually. Something that I also found interesting is that when I searched through the matches I noticed that some of the matches were double matches. So there might be a shared segment on chromosome 7 and also a matching segment on chromosome 10 with the same individual. There were almost a dozen of these double matches. Of especial note was that one of the matches also had a double match with another double match (that is, there was a bridging match between 2 double matches.) This might turn out to be quite helpful if additional doubles could be found. The idea is that having a double match would tell you all those related through the double match would be on one side of the family. Perhaps it could be possible to ultimately put all the relatives into two bins (paternal and maternal- though it might not be clear which was which.)

I am also trying to figure out whether it would be possible to pivot off some of the triangulations. For example, consider a triangulation with A,B and C on chromosome 1. They all have a common ancestor. Now what if C has another matching segment with A on the chromosome 1, though not with B. Does the previous triangulation allow one to conclude that this additional matching segment is another piece from the common ancestor? I am not totally sure, though if this were true it might allow considerably extending out the triangulations and double matches.

I can definitely appreciate the excitement that is now occurring on the genealogy forums. Family relatedness is now becoming transparent.
Having the right software software and having CPU horsepower might be about that is holding things back. I am impressed with how much GEDmatch is able to reveal even with only 100,000 people in their dataset. I would be very interested to see what could be done with the proper tools on 23andme, Ancestry.com or FamilyTreeDNA.com.
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by J11 »

Hmm, that is pretty exciting.

There were 150 double segment matches on 23andme! With that many double matches it might be possible to nearly cleanly separate the two lines of a family.

We've been riding this exponential curve of uptake of genotyping. After all the doublings, we have finally hit the point where the next doubling fills the lake with flowers! Nobody pays much attention at first with doublings until things start to get very interesting. Another doubling would fill in most of the remaining gaps for people. As 23andme moves from 1 million to 2 million customers the entire population genomics network will will go online. This is already what the genealogical pros are talking about.
J11
Contributor
Contributor
Posts: 3351
Joined: Sat May 17, 2014 4:04 pm

Re: Genetic genealogy raw data and Alzheimer's research ...

Post by J11 »

What I am now excited about is reconstructing our loved one's parents' DNA. The Tier 1 service on GEDmatch offers such a feature.
Our loved one has 2 sisters available for genotyping. With 3 siblings, it would be expected that 3 out of 4 times both chromosomes from their parents would have been inherited by one of the siblings and that 1 out of 4 times one of the parental chromosomes would have been inherited by at least one of the siblings. This means that almost 80 of 88 parental autosomal chromosomes would be expected to be recovered.

Additionally, as I noted above there are 150 double matches on 23andme. With the double matches, it should be possible to correctly bin the chromosomes ( even if it were not known whether they were paternal or maternal). If there are double matches through the X chromosome, then it should be possible to assign the bins as paternal or maternal.

After doing this, we should have nearly completely reconstructed genomes for both our loved one's parents. It would then be obvious which of the relatives were through the maternal line that houses the dementia variant. This should reduce the search space in half. Furthermore, the part of the reconstructed maternal genome that was not present would be the part that did not house the dementia variant because it would not have been present in our loved one. GEDmatch allow these reconstructed genomes to contain as little as 1500 cM. Using the reconstruction technique suggested above, there would be 7000 cM. This might allow us to exclude substantial jointly inherited
regions of the genome when doing the reconstruction. This could further greatly reduce the search space.
Post Reply