PDA

View Full Version : Merged or "super" kits?



JerryS.
10-10-2020, 06:46 PM
I've read a few posts that mention how a merged kit is better for accuracy etc. The reasoning was that each commercial test has its own form of shortcoming so merging several kits will help to ensure better accuracy when using G25 or Gedmatch or Vahadou..... so I asked the learned folks here, is a merged kit made up of two or more tests any better than a single test kit?

on another note, what is the V3 and V5 chip issue with 23/me? is the V5 [newer] 23 test better that the [older] V3 tests? is the imputation concern about converting V3 to V5, not about V5 converting to G25?

vettor
10-10-2020, 08:51 PM
I've read a few posts that mention how a merged kit is better for accuracy etc. The reasoning was that each commercial test has its own form of shortcoming so merging several kits will help to ensure better accuracy when using G25 or Gedmatch or Vahadou..... so I asked the learned folks here, is a merged kit made up of two or more tests any better than a single test kit?

on another note, what is the V3 and V5 chip issue with 23/me? is the V5 [newer] 23 test better that the [older] V3 tests? is the imputation concern about converting V3 to V5, not about V5 converting to G25?

from what i recall ....the V3 test was the bigger one and more superior .............the V4 came in with the health part removed.......the V5 is suppose to replace the V4 , but still comes up short compared to the V3

Basically the V3 cost 23andme more money to gather info from, that is the reason why it was stopped,....not enough profit

my kits, for me it is still V3 and all other family kits are v4 kits

dosas
10-10-2020, 09:01 PM
The one I am currently using is a raw file with all SNPs from all raw files ever produced (lol) that I extracted from my Whole Genome Sequence data.

Comparing it to every other raw file I used in the past for G25s, merged and non merged, it is a bit % shifted differently, in relation to modern populations. In relation to ancient populations the %difference is below 1% usually.

Hope it helps.

JerryS.
10-10-2020, 09:25 PM
from what i recall ....the V3 test was the bigger one and more superior .............the V4 came in with the health part removed.......the V5 is suppose to replace the V4 , but still comes up short compared to the V3

Basically the V3 cost 23andme more money to gather info from, that is the reason why it was stopped,....not enough profit

my kits, for me it is still V3 and all other family kits are v4 kits

so could I get my V5 converted to V3 or would that still be missing some things?

JerryS.
10-10-2020, 09:31 PM
The one I am currently using is a raw file with all SNPs from all raw files ever produced (lol) that I extracted from my Whole Genome Sequence data.

Comparing it to every other raw file I used in the past for G25s, merged and non merged, it is a bit % shifted differently, in relation to modern populations. In relation to ancient populations the %difference is below 1% usually.

Hope it helps.

I signed up and submitted my raw data for the FGC beta program since it was free, but 3 months in to it, that program fell apart like a cheap suit because of a coronavirus. no word on it starting back up for some time now. I had hoped it would clear some things up for me.

digital_noise
10-10-2020, 09:41 PM
so could I get my V5 converted to V3 or would that still be missing some things?

This adds even more imputation than the v5 currently has. Let me give an example.my AncestryDNA, ftdna, and MyHeritage raw files all produce the same results in Gedmatch. The 23 and me v5 gets close but is off by a few %’s in certain categories. The v5 to v3 conversion basically matches the AncestryDNA, my heritage and ftdna files. It’s not worth the effort if you already have an AncestryDNA kit.

Doing a super kit on paper covers the most bases but in reality, with Gedmatch at least, it doesn’t offer any superior accuracy. For G25 I don’t think any of this applies. You have an AncestryDNA derived coordinates, that’s the best you’re gonna get.

digital_noise
10-10-2020, 09:43 PM
I signed up and submitted my raw data for the FGC beta program since it was free, but 3 months in to it, that program fell apart like a cheap suit because of a coronavirus. no word on it starting back up for some time now. I had hoped it would clear some things up for me.

He’s talking about the big$$ whole genome sequencing not the free Autosomal analysis.

dosas
10-10-2020, 09:56 PM
This is a demonstration, using the moderns list. There is a small % difference, but, make of it what you will.



Target: dosas_scaled_23ame_v5
Distance: 2.0025% / 0.02002502
51.6 Greek_Trabzon
44.6 Albanian
1.8 Lithuanian_VZ
1.0 Papuan
0.8 Greek_Central_Anatolia
0.2 Luhya_Kenya





Target: dosas_All_SNPs_scaled
Distance: 1.0613% / 0.01061330
47.6 Greek_Trabzon
46.4 Albanian
3.4 Lithuanian_VZ
1.0 Greek_Central_Anatolia
0.8 Papuan
0.4 Cameroon_Bangwa
0.4 Esan_Nigeria

JoeyP37
10-10-2020, 10:49 PM
I have noticed my Ancestry and 23andMe kits give different results; it seems to me like the Ancestry kit is more precise.
23andMe40154
Ancestry40155

JerryS.
10-10-2020, 10:56 PM
I have noticed my Ancestry and 23andMe kits give different results; it seems to me like the Ancestry kit is more precise.
23andMe40154
Ancestry40155

Yes, I think the differences are more noticeable with regionally mixed people. what model are those numbers from?

JerryS.
10-10-2020, 10:58 PM
This is a demonstration, using the moderns list. There is a small % difference, but, make of it what you will.



Target: dosas_scaled_23ame_v5
Distance: 2.0025% / 0.02002502
51.6 Greek_Trabzon
44.6 Albanian
1.8 Lithuanian_VZ
1.0 Papuan
0.8 Greek_Central_Anatolia
0.2 Luhya_Kenya





Target: dosas_All_SNPs_scaled
Distance: 1.0613% / 0.01061330
47.6 Greek_Trabzon
46.4 Albanian
3.4 Lithuanian_VZ
1.0 Greek_Central_Anatolia
0.8 Papuan
0.4 Cameroon_Bangwa
0.4 Esan_Nigeria


i mentioned above, I think the differences are more noticeable with regionally mixed people. Maybe that's why some mentioned the merged kits as being better.

digital_noise
10-11-2020, 01:17 AM
It’s not a big enough difference even for mixed people really.

First is a normal Ancestry File. Second is a "super kit" made up of I think ancestry, 23 and Me and either FTDNA or My Heritage. K13 Eurogenes
Notice the SNP counts. I dont have a 23 and me V5 kit uploaded anymore but it uses only 56,000 SNP's I think

40166
40167

sktibo
10-11-2020, 06:12 PM
Jerry, you can see which kit has more data because it'll show a lower distance in a G25 model. IIRC your merged kit doesn't actually show the lowest distance when you post multiple kits for a single model. In GEDmatch you can see the number of SNPs used in a run. I've never found a merged kit in my experience to use more data than a base kit. Maybe there's a newer way to do it now, but..

JerryS.
10-11-2020, 07:02 PM
Jerry, you can see which kit has more data because it'll show a lower distance in a G25 model. IIRC your merged kit doesn't actually show the lowest distance when you post multiple kits for a single model. In GEDmatch you can see the number of SNPs used in a run. I've never found a merged kit in my experience to use more data than a base kit. Maybe there's a newer way to do it now, but..

I understand that the lower number/distance the better, but I've also read that the lower number isn't always the best if you modify somebody's original amateur model to fit what you believe about your own ancestry. I figured if you add or delete population samples from a model to fit what your own ancestry and still get a reasonably low number close to the original it was still accurate.

sktibo
10-11-2020, 07:32 PM
I understand that the lower number/distance the better, but I've also read that the lower number isn't always the best if you modify somebody's original amateur model to fit what you believe about your own ancestry. I figured if you add or delete population samples from a model to fit what your own ancestry and still get a reasonably low number close to the original it was still accurate.

I'm not talking about making a model I'm talking about how you can tell which kit is using more SNPs

digital_noise
10-11-2020, 09:04 PM
I understand that the lower number/distance the better, but I've also read that the lower number isn't always the best if you modify somebody's original amateur model to fit what you believe about your own ancestry. I figured if you add or delete population samples from a model to fit what your own ancestry and still get a reasonably low number close to the original it was still accurate.

Low fit is not always “most accurate”. It depends entirely on multiple things, one being the calc itself and what samples are chosen for it. Also related to that is a fairly in-depth knowledge of your background. If someone has no idea what they are, G25 is not going to help much, meaning each calc will give something a bit different.

What is your end goal with possibly building a “super kit”?

JerryS.
10-11-2020, 10:28 PM
What is your end goal with possibly building a “super kit”?

I didn't possibly build anything. I gave somebody my Ancestry and 23 kits to merge as one because I thought (from what I've read here from others) that it produces a more well rounded kit, one that works well with all the models instead of needing updates and chip conversions and all that jazz.

digital_noise
10-11-2020, 10:51 PM
You know what I meant...

Anyways, for G25 it’s more than just combining and averaging. You need to merge the raw files and have Davidski generate the coordinates from the merged file.


Ancestry derived scales cover almost all bases.

JerryS.
10-11-2020, 11:05 PM
You know what I meant...

Anyways, for G25 it’s more than just combining and averaging. You need to merge the raw files and have Davidski generate the coordinates from the merged file.


Ancestry derived scales cover almost all bases.

that's what I did. I had somebody merge the files then paid for the file to be G25'ed.

digital_noise
10-11-2020, 11:25 PM
Ok. Again, don’t think it’s doing you any good. I’ve said before that I asked David the exact same thing, and he replied just use AncestryDNA file.

JerryS.
10-11-2020, 11:28 PM
Ok. Again, don’t think it’s doing you any good. I’ve said before that I asked David the exact same thing, and he replied just use AncestryDNA file.

yes, the merged kit is closer to the Ancestry kit than the 23 kit. regarding the 23 data, is the V3 better than V5 or the other way around? V5 is the newer version, correct?

digital_noise
10-11-2020, 11:47 PM
Yes. V5 is the most recent. In terms of 23&Me yea I think it’s probably the best. But it’s equal to the AncestryDNA raw data.
You cannot get the v3 anymore and all the v5 to v3 conversations are heavily imputed.

JerryS.
10-11-2020, 11:49 PM
Yes. V5 is the most recent. In terms of 23&Me yea I think it’s probably the best. But it’s equal to the AncestryDNA raw data.
You cannot get the v3 anymore and all the v5 to v3 conversations are heavily imputed.

you mean isn't equal, correct?

digital_noise
10-12-2020, 12:38 AM
No, v3 data is equal to AncestryDNA data in Gedmatch terms. And likely G25. So there’s no point in doing any sort of conversion from v5 to v3 if you already have AncestryDNA data.

Sorry, I was not clear in the previous response. The v5 is the black sheep in the raw data world. That’s why people often will convert to v3 from v5, but to do that it imputes the data, which is essentially estimating the missing SNP’s.

JerryS.
10-12-2020, 12:40 AM
No, v3 data is equal to AncestryDNA data in Gedmatch terms. And likely G25. So there’s no point in doing any sort of conversion from v5 to v3 if you already have AncestryDNA data.

Sorry, I was not clear in the previous response.

not to mention the cost of doing so.

digital_noise
10-12-2020, 01:02 AM
One of the free DNA Studio or Admixture Studio, I cant remember which actually can convert it.

aaronbee2010
10-12-2020, 01:43 AM
One of the free DNA Studio or Admixture Studio, I cant remember which actually can convert it.

DNA Kit Studio is the one you're thinking of. Admixture Studio just runs ethnicity calculators (the PRO version has some more features, though).

Caius Agrippa
10-13-2020, 10:42 PM
Whole genome sequencing and an Ancestry DNA file probably would give similar results in most of these tools. It is so because these DIY tools we are using by now are meant to be used with the chips of 23andme, Ancestry and other companies. In the end they don't read all of your SNPs even if you did a Dante Labs test. I don't know of any ancestry test by now that is able to explore full genome sequencing.

Robert1
10-14-2020, 03:47 AM
Speaking of an end goal of building a super kit, I've been thinking of doing super kits at GEDMatch (merging older FTDNA OmniExpress and Living DNA GSA kits to get ~800,000 SNP kits) of my mother, paternal aunt, sisters and me. Then use them to build a Lazarus kit of my father. Usually you're lucky to build a Lazarus kit close to 50% of a dead person's SNPs but with good super kits maybe you can do better. Makes sense so I need to give this a try.

My current Lazarus kit for my father doesn't perform as well for DNA matches than his sister's (my paternal aunt).

Hmmm, come to think of it the current Lazarus kit is all from OmniExpress kits so I guess I could just build a Lazarus file from the same people using their GSA kits then merge the two Lazarus kits into a super Lazarus kit. It would be interesting to compare the three Lazarus kits, too.

Saetro
10-15-2020, 09:06 PM
Speaking of an end goal of building a super kit, I've been thinking of doing super kits at GEDMatch (merging older FTDNA OmniExpress and Living DNA GSA kits to get ~800,000 SNP kits) of my mother, paternal aunt, sisters and me. Then use them to build a Lazarus kit of my father. Usually you're lucky to build a Lazarus kit close to 50% of a dead person's SNPs but with good super kits maybe you can do better. Makes sense so I need to give this a try.

My current Lazarus kit for my father doesn't perform as well for DNA matches than his sister's (my paternal aunt).

Hmmm, come to think of it the current Lazarus kit is all from OmniExpress kits so I guess I could just build a Lazarus file from the same people using their GSA kits then merge the two Lazarus kits into a super Lazarus kit. It would be interesting to compare the three Lazarus kits, too.

As you have 3 siblings (your sisters and yourself), you could also try Visual Phasing for your father's DNA, but the Lazarus approach is going to be quicker.
There used to be a Facebook group or two for Visual Phasing and there is a chapter in "Advanced Genetic Genealogy" ed. Debbie Parker Wayne.