PDA

View Full Version : U152 Predictions with Nevgen



R.Rocca
02-28-2017, 03:36 PM
To my surprise, my little cluster has been accurately predicted using the R1b specific setting of the Nevgen predictor found here:

http://www.nevgen.org/

My group is U152 > L2 > ZZ48 > ZZ56 > FGC10516. I'm positive for some subclades as well, but I think it is great that this tool predicted FGC10516 as 100% probability, especially at only 67 markers. Have other U152 folks had similar success?

razyn
02-28-2017, 04:06 PM
The guys who created that predictor have updated the R1b part several times, including as recently as yesterday, I think. (Anyway, in Feb. 2017.) I posted about it on a more general thread today: http://www.anthrogenica.com/showthread.php?8873-Y-Haplogroup-prediction-help&p=216786&viewfull=1#post216786

MitchellSince1893
02-28-2017, 05:14 PM
To my surprise, my little cluster has been accurately predicted using the R1b specific setting of the Nevgen predictor found here:

http://www.nevgen.org/

My group is U152 > L2 > ZZ48 > ZZ56 > FGC10516. I'm positive for some subclades as well, but I think it is great that this tool predicted FGC10516 as 100% probability, especially at only 67 markers. Have other U152 folks had similar success?
I just get

R1b (for 67+ markers, try level for R1b-s, 160+ subclades)
100 56.58 1.14

Is there some extra step I'm missing?

jbarry6899
02-28-2017, 05:40 PM
Correctly predicted S8183 for me, with 61 % probability.

MitchellSince1893
02-28-2017, 06:01 PM
Correctly predicted S8183 for me, with 61 % probability.

So you are just using the main page to get this result?

jbarry6899
02-28-2017, 06:14 PM
on the left hand side there are what look like three sliders. Click those to select the matching algorithm. Choose the one for R1b subclade.

mangumheel
02-28-2017, 07:11 PM
By SNP testing I am Z34+ Z35- Confirmed by FTDNA as CTS9044. I was predicted by nevgen as Z33 with 98.6% probability. Is Z33 equivalent to Z34+ but Z35- ?

MitchellSince1893
02-28-2017, 10:11 PM
on the left hand side there are what look like three sliders. Click those to select the matching algorithm. Choose the one for R1b subclade.

Thanks it won't work with my smart phone so I'll try it on a laptop

MitchellSince1893
02-28-2017, 11:07 PM
To my surprise, my little cluster has been accurately predicted using the R1b specific setting of the Nevgen predictor found here:

http://www.nevgen.org/

My group is U152 > L2 > ZZ48 > ZZ56 > FGC10516. I'm positive for some subclades as well, but I think it is great that this tool predicted FGC10516 as 100% probability, especially at only 67 markers. Have other U152 folks had similar success?

Ok now that I did it using the correct calculator using 111 markers I got R1b U152>L2>Z49>Z142> Z150>FGC12378 saying 100% probability. That is 100% correct.

I was able to get down to the first 52 markers and still maintain 100% chance I was correctly identified.
1-37 markers gets 99.46% probability of FGC12378.
1-33 markers gets 91.9% probability of FGC12378.
1-25 markers gets 67.1% probability of FGC12378.
1-12 markers gets 6.25% probability of FGC12378.

That's quite impressive.

MitchellSince1893
03-01-2017, 01:16 AM
Just tried it on a 3 other SNP test confirmed U152>L2+ folks and it's not doing very good.

At 67 markers it said first man was:
64.9% chance he's R1b L21>DF13>ZZ10>Z255> Z16429
And less 10% chance he's R1b U152>L2

The other U152>L2+ fellow came at 67 markers had:
41.6% chance R1b L21>> ZZ10>Z253> Z2186>L106,
28.9% chance R1b L21>DF13> ZZ10>Z253> S847
And only .72% chance of actually being L2+

A L2>Z49+ fellow at 67 markers came out:
30.35% chance R1b L51>L151> CTS4528> S14328
20.95% chance R1b L21>DF13> FGC5494
17.42% chance R1b L21>> ZZ10>Z253> Z2186>L1066
.02% chance R1b U152>L2> Z49

Oh well. I was all excited based on how accurate it was with my results, but these 3 samples show it's not very accurate on other results.

emmental
03-01-2017, 01:49 AM
I get R1b U152>Z36 Cluster 2 (100%). If cluster 2 is FGC6418 et al, it's correct.

mafe
03-01-2017, 01:58 PM
Probability of unsupported subclade: 71.73%
R1b U152>L2> Z49>Z142> L562: 18.65%
R1b U152>Z36 Cluster 5: 4.96%
R1b U152>L2> Z258>L20: 4.51%

nroelofs
03-01-2017, 05:03 PM
My results are:
R1b U152>L2> Z49>Z142> FGC22963
This is correct for me.

Kelso
03-01-2017, 05:28 PM
My results are mostly right!

R1b U152>L2>PF6658>> FGC5033 at 100%; Fitness, 72.13 and Fitness 2, 1.35

It picked up my PF6658 but gave me L2 which I do not pass through. I am L2- and go U152>PF6658. FGC5033 is my last shared block of SNPs.

Tim

kw5368
03-03-2017, 06:33 PM
Mine found R1b U152>L2>Z49>Z142> Z150>BY1701 at 111 markers. Fitness 80.73 Fitness 2 1.34

It missed BY1542/L654>S42>CTS7197

Pretty close to perfect considering there are not to many of us at BY1542/L654>S42.

kw5368
03-03-2017, 07:42 PM
My results are mostly right!

R1b U152>L2>PF6658>> FGC5033 at 100%; Fitness, 72.13 and Fitness 2, 1.35

It picked up my PF6658 but gave me L2 which I do not pass through. I am L2- and go U152>PF6658. FGC5033 is my last shared block of SNPs.

Tim

I don't think you can be PF6658 without being positive for L2. PF6658 is part of the L2 pack at FTDNA.

MattL
03-05-2017, 06:03 AM
With only 37 markers 100% probability of "R1b U152>L2> Z49>Z142> FGC22963"
Fitness 71.45
Fitness2 1.27

This is correct.

delegz
03-06-2017, 11:24 PM
With only 37 markers 100% probability of "R1b U152>L2> Z49>Z142> FGC22963"
Fitness 71.45
Fitness2 1.27

This is correct.

With my 106 marker results it finds that I have only 0.2% probability of being "R1b U152>L2>Z49>Z142>FGC22963", which on the contrary is exactly what I have.

Where do I find the fitness results?

Matt, I wonder whether the descendant of William C. Langley in the R-U152 Project who's also tested positive for FGC22963 and has FTDNA's 101 marker results would also be found by NEVGEN not to be in its subclade? Think I'll go find out now....

MitchellSince1893
03-07-2017, 02:49 AM
I can't help but wonder if the creators of this tool have entered many of the individual haplotypes for certain SNPs so that when you re enter those same values you obviously come up 100% correct; and the ones that come up way off are because those haplotypes have not yet been entered into the tool. :unsure:

MattL
03-08-2017, 09:04 AM
With my 106 marker results it finds that I have only 0.2% probability of being "R1b U152>L2>Z49>Z142>FGC22963", which on the contrary is exactly what I have.

Where do I find the fitness results?

Matt, I wonder whether the descendant of William C. Langley in the R-U152 Project who's also tested positive for FGC22963 and has FTDNA's 101 marker results would also be found by NEVGEN not to be in its subclade? Think I'll go find out now....

hmm, odd, maybe the more markers the more confused it gets?


I can't help but wonder if the creators of this tool have entered many of the individual haplotypes for certain SNPs so that when you re enter those same values you obviously come up 100% correct; and the ones that come up way off are because those haplotypes have not yet been entered into the tool. :unsure:

It's entirely possible.

ArmandoR1b
03-08-2017, 04:12 PM
I can't help but wonder if the creators of this tool have entered many of the individual haplotypes for certain SNPs so that when you re enter those same values you obviously come up 100% correct; and the ones that come up way off are because those haplotypes have not yet been entered into the tool. :unsure:
I doubt that to be the case. The predictor has been around long enough for them to have entered all of the haplotypes that are readily available from the FTDNA project pages by now. Try randomly selecting the results of kits with SNP testing that are visible without logging into your FTDNA account and put them into the predictor you will eventually find subclades that aren't accurately predicted. It's just too hard to predict certain subclades even with 67 markers.

rms2
03-08-2017, 04:15 PM
I think we can chalk up some of the mispredictions to the rapid expansion of P312 in the Bronze Age and the sudden proliferation of clades from sires with strikingly similar haplotypes.

MitchellSince1893
03-08-2017, 05:28 PM
I doubt that to be the case. The predictor has been around long enough for them to have entered all of the haplotypes that are readily available from the FTDNA project pages by now. Try randomly selecting the results of kits with SNP testing that are visible without logging into your FTDNA account and put them into the predictor you will eventually find subclades that aren't accurately predicted. It's just too hard to predict certain subclades even with 67 markers.

I plead ignorance as to how such tools are programed. It just struck me odd that some samples are spot on at 100% and others are way off e.g. less than 1% chance it gives the correct branch. http://www.anthrogenica.com/showthread.php?9839-U152-Predictions-with-Nevgen&p=216895&viewfull=1#post216895

ArmandoR1b
03-08-2017, 06:23 PM
I plead ignorance as to how such tools are programed. It just struck me odd that some samples are spot on at 100% and others are way off e.g. less than 1% chance it gives the correct branch.

I don't find it odd. I find it par for the course. There was a time when FTDNA tried applying their own predictor to their own customers and it caused a lot of people to be in different subclades and, at times, even different haplogroups from what their own SNP testing had shown them to be. They had to undo all of the predictions. Before that I would see predictions by other people that I had to argue about since the predictions didn't make sense and I turned out to be right.


http://www.anthrogenica.com/showthread.php?9839-U152-Predictions-with-Nevgen&p=216895&viewfull=1#post216895
I take it you are saying those are from public kits. I don't see any reason that they would not be included unless they are recent results. The bad prediction would be due to a haplotype that is too similar to the predicted subclades. Like rms2 stated "I think we can chalk up some of the mispredictions to the rapid expansion of P312 in the Bronze Age and the sudden proliferation of clades from sires with strikingly similar haplotypes."

MitchellSince1893
03-08-2017, 07:32 PM
I take it you are saying those are from public kits. I don't see any reason that they would not be included unless they are recent results. The bad prediction would be due to a haplotype that is too similar to the predicted subclades. Like rms2 stated "I think we can chalk up some of the mispredictions to the rapid expansion of P312 in the Bronze Age and the sudden proliferation of clades from sires with strikingly similar haplotypes."

The samples in my link have been publicly available in the U152 project for quite some time. I will defer to your knowledge on the subject.

kw5368
03-08-2017, 08:25 PM
on the left hand side there are what look like three sliders. Click those to select the matching algorithm. Choose the one for R1b subclade.

Is everyone using this? Just asking. Could explain the discrepancies. Everyone I have checked has been pretty close.

Nevski
03-09-2017, 04:42 AM
I can't help but wonder if the creators of this tool have entered many of the individual haplotypes for certain SNPs so that when you re enter those same values you obviously come up 100% correct; and the ones that come up way off are because those haplotypes have not yet been entered into the tool. :unsure:

I hope you don't mind if I, as uninvited, try to answer this question. I think I am right person for that job, since I am one of two guys administering NevGen predictor.

This is good remark, and effect you noticed we call "effect of poor statistics". Many R1b subclades on NevGen has not so good statistics, since they are made on small number of available haplotypes (in many cases only 5, in few cases on even less). And because of algorhythm used (Bayesian frequency improved by marker value correlation, both negative and positive), predictor is always biased towards haplotypes used for statistics (they perfectly match to used correlations, because they are made based on those haplotypes). Ofcourse, prediction is never made by comparing entered haplotype with haplotypes from statistics, it would be idiotic and too slow, and would not work if any of markers is changed. NevGen must give meaningful output for any input, even for if random numbers are entered.

But, such bias is not almighty and absolute. I shall tell about two example.
First is about small experiment we made last year. we intentionaly had puted A00 haplotype (full 111 markers) into statistics of some of G subclades supported by NevGen. After data was compiled, we putted A00 haplotype int predictor and it was not predicted for member of G subclade. It gave 0 probability, and very poor both fitnesses, no matter it was part of statistics. In this case, it was result as I was hoping.

Another example: last summer in some of subclades under U106 (I dont remember what it was, let call it Subclade A) we found 111 marker haplotype that was part of its statistics (it was putted into it's section on U106, so we had picked it), but when placed into predictor, it gave perfect 100% of probability with excelent fitnesses to another subclade of U106, not to one into which statistics it was placed. Then I realised that it had been only (wrongly) predicted to belong to subclade A on U106 project (or placed into wrong section by mistake), and not SNP-tested elsewhere, as I had thought. Some time later, I saw that haplotype was thrown out from section for subclade A, and placed into "unknown" section (or something alike). So, in this case predictor gave prediction to another subclade although it was biased to subclade A.

Few days ago I went through Iberian project and found 29 R1b haplotypes deep SNP-tested not seen before by us. Since they were not part of statistics, and with known deep subclade, I made little experiment and put them all into NevGen to see how many will be good predicted. Out of 29 haplotypes, 20 has its known subclade as first listed in NevGen prediction (one of them was of only 37 markers). Of 9 wrong, in one or two cases their SNP verified subclade was second or third in NevGen list. Out of 4 haplotypes under U152, only one was rightly predicted. All of those haplotypes are now part of statistics, so it is now little better.

Well, we already knew that U152 is the most problematic of all major R1b branches. Since algorhythm is the same for all, reason for that is that U152 has smallest number of available deep SNP-tested haplotypes. We are trying to find more of them to make statistics better.

www.nevgen.org/AboutNevGen.html

NevGen predictor is far from perfect, but it is work in progress, and with every new haplotype added to its statistics it is made better. Subclades with more haplotypes in statistics are less prone to such bias and has much greater accuracy than those with small number of haplotypes available.

MitchellSince1893
03-09-2017, 06:05 AM
I hope you don't mind if I, as uninvited, try to answer this question. I think I am right person for that job, since I am one of two guys administering NevGen predictor.

This is good remark, and effect you noticed we call "effect of poor statistics". Many R1b subclades on NevGen has not so good statistics, since they are made on small number of available haplotypes (in many cases only 5, in few cases on even less). And because of algorhythm used (Bayesian frequency improved by marker value correlation, both negative and positive), predictor is always biased towards haplotypes used for statistics (they perfectly match to used correlations, because they are made based on those haplotypes). Ofcourse, prediction is never made by comparing entered haplotype with haplotypes from statistics, it would be idiotic and too slow, and would not work if any of markers is changed. NevGen must give meaningful output for any input, even for if random numbers are entered.

But, such bias is not almighty and absolute. I shall tell about two example.
First is about small experiment we made last year. we intentionaly had puted A00 haplotype (full 111 markers) into statistics of some of G subclades supported by NevGen. After data was compiled, we putted A00 haplotype int predictor and it was not predicted for member of G subclade. It gave 0 probability, and very poor both fitnesses, no matter it was part of statistics. In this case, it was result as I was hoping.

Another example: last summer in some of subclades under U106 (I dont remember what it was, let call it Subclade A) we found 111 marker haplotype that was part of its statistics (it was putted into it's section on U106, so we had picked it), but when placed into predictor, it gave perfect 100% of probability with excelent fitnesses to another subclade of U106, not to one into which statistics it was placed. Then I realised that it had been only (wrongly) predicted to belong to subclade A on U106 project (or placed into wrong section by mistake), and not SNP-tested elsewhere, as I had thought. Some time later, I saw that haplotype was thrown out from section for subclade A, and placed into "unknown" section (or something alike). So, in this case predictor gave prediction to another subclade although it was biased to subclade A.

Few days ago I went through Iberian project and found 29 R1b haplotypes deep SNP-tested not seen before by us. Since they were not part of statistics, and with known deep subclade, I made little experiment and put them all into NevGen to see how many will be good predicted. Out of 29 haplotypes, 20 has its known subclade as first listed in NevGen prediction (one of them was of only 37 markers). Of 9 wrong, in one or two cases their SNP verified subclade was second or third in NevGen list. Out of 4 haplotypes under U152, only one was rightly predicted. All of those haplotypes are now part of statistics, so it is now little better.

Well, we already knew that U152 is the most problematic of all major R1b branches. Since algorhythm is the same for all, reason for that is that U152 has smallest number of available deep SNP-tested haplotypes. We are trying to find more of them to make statistics better.

www.nevgen.org/AboutNevGen.html

NevGen predictor is far from perfect, but it is work in progress, and with every new haplotype added to its statistics it is made better. Subclades with more haplotypes in statistics are less prone to such bias and has much greater accuracy than those with small number of haplotypes available.

Nevski, Thank you for taking time to respond and welcome to anthrogencia. :welcome:

Nevski
03-09-2017, 10:41 AM
Mitchel, thank you very much for your welcome post.

One thing more I need to say: in many cases even with hundreds or thousands of available deep SNP tested haplotypes prediction might not give good accuracy score.
That is case when subclades are too wide (diverse), and too close to each other.
Excelent example of that is our first and unsuccesfull try of divison of R1b into several first level subclades: L21, DF27, U152, U106 and Z2209.
For first 4 of them it did not worked good, even with all correlations of marker values used. It needed to be divided into much deeper subclades in order to give better accuracy scores during testing. "Divide and conquer" strategy. That way we now have about 290 subclades of R1b.
And in U152 case, some of subclades are still too wide, so they must be further subdivided in predictor to be better recognized. Such is Z36, which is much worse than problematic, it is nightmare. Once, in fact, I realy dreamed it. :-) We tried many ways to get something of it, but it is still too diverse to be rightly predicted. It need much more deep SNP-tested haplotypes in order to be divided into deeper subclades which might be good predicted.

Another thing is incomplete list of subclades which should be taken into account. For example, under Z49 for now we have only next subclades. Other Z49 haplotypes, which belong to unsupported or not yet discovered subbranches will probably finish as false positives.

Z49>S8183
Z49>Z142> FGC22963
Z49>Z142> L562
Z49>Z142> Z150> BY1701
Z49>Z142> Z150> FGC12378
Z49>Z142> Z150 (all rest of Z150)

For that is no good solution. We can only wait till more haplotypes are available and then make more subclades in NevGen.

Nevski
06-30-2017, 06:42 PM
Couple of weeks ago on some of FTDNA projects I stumbled across 67 markers long haplotype flagged with SNPs U152+, L2+ and as his final S1567+. I entered it into predictor, and got 100% M222 (under L21 is, mostly irish), with high fitnesses!
On generated picture there were many green columns, which means it is part of M222 signature. There were many if not all M222 > S658 signature markers. If it were not predicted M222 but any other subclade, I would just say "Yet another wrong R1b subclade prediction, I saw too many of them :(". But it was M222, it is rare R1b subclade that is recognisable from outer space. I was shure it was error, either with SNPs or markers.
And guy was from United Kingdom and has irish-looking name. I just put haplotype on side and writed my notes about it.

And today, I stumbled onto another 67 markers haplotype, it was flagged with L21+, DF49+, M222+ and U152- (among others). I putted it into predictor to see what subclade of M222 it might be, and I saw prediction, which astonished me:

Probability = 100.00% Fitness=80.55 [1.13] R1b U152> L2> S1567

Fitness for S1567 is very good, there are several green columns, which means markers match S1567 signature. This is rare ocation when U152 prediction yields good and clear results. In this case, there was not any doubt that prediction is reliable. Guy is from England, and it is obvious from STRs he is not M222.

I suspect that this two kits were somehow wrongly mixed and used. What do you think what should be done here? Is it adviseable to inform FTDNA support to this?
I think it is not good idea to public those two kits, nor haplotypes, to preserve its privacy.

R.Rocca
06-30-2017, 07:38 PM
Couple of weeks ago on some of FTDNA projects I stumbled across 67 markers long haplotype flagged with SNPs U152+, L2+ and as his final S1567+. I entered it into predictor, and got 100% M222 (under L21 is, mostly irish), with high fitnesses!
On generated picture there were many green columns, which means it is part of M222 signature. There were many if not all M222 > S658 signature markers. If it were not predicted M222 but any other subclade, I would just say "Yet another wrong R1b subclade prediction, I saw too many of them :(". But it was M222, it is rare R1b subclade that is recognisable from outer space. I was shure it was error, either with SNPs or markers.
And guy was from United Kingdom and has irish-looking name. I just put haplotype on side and writed my notes about it.

And today, I stumbled onto another 67 markers haplotype, it was flagged with L21+, DF49+, M222+ and U152- (among others). I putted it into predictor to see what subclade of M222 it might be, and I saw prediction, which astonished me:

Probability = 100.00% Fitness=80.55 [1.13] R1b U152> L2> S1567

Fitness for S1567 is very good, there are several green columns, which means markers match S1567 signature. This is rare ocation when U152 prediction yields good and clear results. In this case, there was not any doubt that prediction is reliable. Guy is from England, and it is obvious from STRs he is not M222.

I suspect that this two kits were somehow wrongly mixed and used. What do you think what should be done here? Is it adviseable to inform FTDNA support to this?
I think it is not good idea to public those two kits, nor haplotypes, to preserve its privacy.

From what I know, the Irish kit owner has been sent a new kit by FTDNA.

Kwheaton
06-30-2017, 11:53 PM
Finally got round to doing this. It accurately predicted R1b U152>L2> FGC22501 With 100% probability. very impressive and I will suggest this to some who are too cheap to test!

Solothurn
10-31-2017, 06:20 AM
I used it and got: R1b U152>L2> PF6658>> FGC30121

Any ideas why it gives PF6658 as a L2 subclade?

TRanger
03-25-2018, 10:54 PM
Sorry to revive an old thread but I just stumbled across it. I plugged in my numbers and got a probability of unsupported subclade: 98.47%. The closest match I got was R1b L51>L151> CTS4528> S14328 at 0.76%. I was surprised as Rich Rocca was predicted and we are both L2> ZZ48> ZZ56. Anyone care to comment?

Allen Slaughter

Acque agitate
03-26-2018, 09:33 AM
Hi Allen,
a few days ago Ytree.net discovered that you, as well as belonging to the ZZ48 + ZZ56 + group, are part of a subgroup together with Henry Baston (FtDna B79056). This new subgroup is characterized by a single snp, 7560719-A-C (named Y84137).