PDA

View Full Version : ISOGG 2016 Y-DNA haplogroup R1b tree



grouza31
02-16-2016, 04:31 PM
Does someone understands something about this new 2016 ISOGG R1b YDNA haplogroup tree, more precisely the following extracted R1b1c and R1b1a2 branches?
Is there now no link between V88 and R1b1c?
Is there no contradiction between the presentation of R1b1a2 branch and the following statement of ISOGG?
“Paragroup R1b1* and R1b1c-V88 are found most frequently in SW Asia and Africa. The African examples are almost entirely within R1b1c and are associated with the spread of Chadic languages”.


• • R1b M343/PF6242
• • • R1b1 L278, CTS46, CTS910/FGC66/PF6244, CTS2134/FGC47/PF6253, CTS2229/FGC31/PF6254,
• • • • R1b1a L754/PF6269/YSC0000022, A702/Z8137, CTS3063/V2515, CTS3794/PF6256/S4229, CTS4244/PF6257/V2997/YSC0001279, CTS4764/FGC59/PF6041, CTS5454/M6955, CTS7585, CTS8436/PF6259, CTS8612/PF6260, CTS9972/PF6261/V3867, FGC35/PF6150, FGC36/Y409, FGC41/M12190/SK2062/V1501/Y108/Z8135, L761/PF6258/YSC0000266, L820/PF6262/V4199/YSC0000211, L1068/PF6264/YSC0000223, L1345/PF6266/YSC0000224, PF6249, PF6263, PF6271
• • • • • R1b1a1 L388/PF6468, L389/PF6531
• • • • • • R1b1a1a P297/PF6398, CTS3876/PF6458, CTS5082, CTS5577/FGC38/PF6464, CTS7904/FGC32/PF6471, CTS7941/FGC51/PF6472, CTS9018/FGC188/PF6484, CTS11985/PF6523, FGC57/L1067/M12189/PF6424/SK2079/Z8148, FGC69/L320/PF6092/Z8143, L502/PF6487, L585/PF6499, L752/PF6483, PF6418/YSC0000061, PF6451/YSC0000166, PF6459/S3848, PF6463, PF6475/S17/YSC0000269, PF6498, PF6501, PF6506, PF6524
• • • • • • • R1b1a1a1 M73, M478
• • • • • • • R1b1a1a2 M269, CTS623, CTS2664/PF6454, CTS3575/PF6457, CTS8591, CTS8665/FGC464/PF6479, CTS8728/L1063/PF6480/S13, CTS10834, CTS11468/FGC49, CTS12478/PF6529, F1794/PF6455, L265/PF6431, L407/PF6252, L478/PF6403, L482/PF6427, L483, L500/PF6481, L773/PF6421/YSC0000276, L1353/PF6489/YSC0000294, M520/PF6410, PF6399/S10, PF6404, PF6505/YSC0000225, PF6409, PF6411, PF6425, PF6430, PF6432, PF6434, PF6438, PF6443, PF6482/YSC0000203, PF6485/S3, PF6494, PF6495, PF6497/YSC0000219, PF6500, PF6507, PF6509, L150.1/PF6274.1/S351.1
• • • • • • • • R1b1a1a2a L23/PF6534/S141, L49.1/S349.1
.
.
• • • • • R1b1a2 PF6279/V88
• • • • • • R1b1a2a M18
• • • • • • R1b1a2b V35
• • • • • • R1b1a2c V69
• • • • •R1b1b~M335
• • • • •R1b1c PH155, F2482, PH200, PH491, PH861, PH885, PH1030, PH1165, PH1187, PH1417, PH1554, PH1769, PH1840, PH2150, PH2274, PH2588, PH2675, PH2768, PH2778, PH2813, PH3272, PH3508, PH3516, PH3826, PH3939, PH4174, PH4622, PH4673, PH4796, PH5173, SK2056, SK2058, SK2060, SK2061

Megalophias
02-16-2016, 05:22 PM
Does someone understands something about this new 2016 ISOGG R1b YDNA haplogroup tree, more precisely the following extracted R1b1c and R1b1a2 branches?
Is there now no link between V88 and R1b1c?
Is there no contradiction between the presentation of R1b1a2 branch and the following statement of ISOGG?
“Paragroup R1b1* and R1b1c-V88 are found most frequently in SW Asia and Africa. The African examples are almost entirely within R1b1c and are associated with the spread of Chadic languages”.

It is just a change in terminology, which is based on the number and relationship of the known branches of R1b, not in the facts of distribution. This happens all the time with Y haplogroups, the terminology is always changing (for instance last year's O2a1 is now O1b1a1a, and O3 is now O2!). The names like "R1b1bc" are just a convenience (and often confusing), what matters is the *defining mutation* - V88. So it doesn't matter if they call it R1b1c, or R1b1a2, or R1b1b, it is the same thing. Yes, to be consistent they should have changed the name on that page, but it will still be different everywhere else anyway, and it makes no difference what you call it: it's still the same thing, V88.

It has changed because they now recognize the existence of a newly-discovered branch of R1b, R1b-PH155, which recent studies found in Tajikistan and Bhutan. The branches of V88 and L389 (from which M269 is descended) are more closely related to each other than to PH155, united by the SNP L389, so their shared lineage is now R1b1a. The position of the very rare mutation M335 is still unknown so it gets its own branch R1b1b - although in reality it may belong to one of the others - while R1b-PH155 ends up being R1b1c.

With mitochondrial DNA haplogroups they do the opposite, leave the names the same and just add on new levels - but then it's just as confusing, because you end up with D2 being a branch of D4e1, D being a sister of M80 under M, and M being part of L3. (But mtDNA doesn't have proper defining mutations so you really don't have much choice.)

grouza31
02-16-2016, 08:46 PM
It is just a change in terminology, which is based on the number and relationship of the known branches of R1b, not in the facts of distribution. This happens all the time with Y haplogroups, the terminology is always changing (for instance last year's O2a1 is now O1b1a1a, and O3 is now O2!). The names like "R1b1bc" are just a convenience (and often confusing), what matters is the *defining mutation* - V88. So it doesn't matter if they call it R1b1c, or R1b1a2, or R1b1b, it is the same thing. Yes, to be consistent they should have changed the name on that page, but it will still be different everywhere else anyway, and it makes no difference what you call it: it's still the same thing, V88.

It has changed because they now recognize the existence of a newly-discovered branch of R1b, R1b-PH155, which recent studies found in Tajikistan and Bhutan. The branches of V88 and L389 (from which M269 is descended) are more closely related to each other than to PH155, united by the SNP L389, so their shared lineage is now R1b1a. The position of the very rare mutation M335 is still unknown so it gets its own branch R1b1b - although in reality it may belong to one of the others - while R1b-PH155 ends up being R1b1c.

With mitochondrial DNA haplogroups they do the opposite, leave the names the same and just add on new levels - but then it's just as confusing, because you end up with D2 being a branch of D4e1, D being a sister of M80 under M, and M being part of L3. (But mtDNA doesn't have proper defining mutations so you really don't have much choice.)

Many thanks.

grouza31
02-17-2016, 10:12 AM
If I’m not mistaken, ISOGG 2016 R1b tree is partly based on the experimental tree of Ray Banks. Why ISOGG does not integrate, particularly at the V88 branch, the most detailed data of Smal R1b tree or those established by YFull?
Are there no consensus of everyone on R1b established by Smal or that of Yfull?
Or are there other reasons?

smal
02-17-2016, 12:09 PM
grouza 31, these new changes in the ISOGG tree have occured due to my activity. Of course, I plan to improve the V88 branch as well. However, this is time consuming process. I need to prepare an elaborate submission. And V88 is only one of many clades which have to be improved.

VinceT
02-17-2016, 12:09 PM
"ISOGG" is not a formal, or legal, institution. Its tree has little scientific oversight, if any. It does not receive financial support either. Katherine Borges controls all aspects of the ISOGG.org internet domain.

Ray Banks was appointed by Katherine Borges' friend Alice Fairhurst to take over management of the ISOGG tree. Ray is an hg G specialist, but has poor knowledge of haplogroup R. Neither Katherine, Alice, or Ray are professional scientists themselves.

Smal is the R1b-V88 expert. Currently, he is attempting to rectify his tree with with Ray's understanding of the tree. However, it will take a while before that part of the ISOGG tree is updated, due to the number of changes that are required, and the fact that all changes are still hand-coded with an HTML editor and saved as HTML documents on Katherine's website.

grouza31
02-17-2016, 01:02 PM
Small and Vincent, thank you for your explanation.
Small, I know you're a great expert and a great champion. We thank you very much for your work.

grouza31
02-17-2016, 02:51 PM
"ISOGG" is not a formal, or legal, institution. Its tree has little scientific oversight, if any. It does not receive financial support either. Katherine Borges controls all aspects of the ISOGG.org internet domain.

Ray Banks was appointed by Katherine Borges' friend Alice Fairhurst to take over management of the ISOGG tree. Ray is an hg G specialist, but has poor knowledge of haplogroup R. Neither Katherine, Alice, or Ray are professional scientists themselves.


If Neither Katherine, Alice, or Ray are professional scientists themselves, on what basis and who develops the scientific and technical recommendations like the one below?
I chose this particular recommendation because we had on other thread discussions on the relationship between SNPs, Y STR and Y haplogroup tree building.
Some people said that Y STR is not used in the development of Y haplogroup tree and there is no connection between STR and SNP.
How then interpret this recommendation of ISOGG even if they are not professional scientists?

• Sanger Sequencing
Examples of Sanger sequencing are the tests at the company ySeq and the Advanced Tests (SNP) at Family Tree DNA. STR testing is available, for instance, at Genebase and Family Tree DNA. Acceptable testing for this category consists of Sanger sequencing which targets a short segment of Y-DNA.

The objective of the ISOGG Tree at this time is to include all SNPs that arose prior to about the year 1500 C.E. This guideline may be measured through STR diversity or alternative evidence.

Where a new terminal subgroup is being added, STR marker results or other evidence des-cribed below for two men with the new SNP are needed.

1 STR Diversity
To be accepted the SNP must be observed in at least two individuals and must meet the STR diversity requirement. A SNP that does not meet this requirement will be classified as a Pri-vate SNP (see definition above).

The STR diversity requirement is met if the following conditions are satisfied:
a. If the SNP is a Non-Terminal Branch SNP, no further proof of diversity is required.
b. Genetic distance is calculated using the Infinite Alleles Model (IAM). A marker for which there is a null value in one sample must be discarded from the calculations. Otherwise, most laboratories use the IAM.
c. All markers tested by both individuals must be compared.
d. If 74 markers (or fewer) are compared, the minimum genetic distance to meet the di-versity requirement is 5.
e. If 75 (or more) markers are compared, the diversity requirement is a minimum of 7%, computed by dividing the genetic distance by the number of markers compared, and rounding to the nearest integer value.

Alternative Evidence
If the submitter can otherwise provide evidence that the common ancestor of the two samples can be reasonably expected to have lived more than 500 years ago, this will also be considered.

MacUalraig
02-17-2016, 03:25 PM
Not trying to defend the above but the point is that the structure of the tree is solely built from SNPs. The reference to STRs above is restricted to dating terminal SNPs (and workarounds like presumably paper trails are acceptable alternatives).

VinceT
02-17-2016, 09:34 PM
If Neither Katherine, Alice, or Ray are professional scientists themselves, on what basis and who develops the scientific and technical recommendations like the one below?
I chose this particular recommendation because we had on other thread discussions on the relationship between SNPs, Y STR and Y haplogroup tree building.
Some people said that Y STR is not used in the development of Y haplogroup tree and there is no connection between STR and SNP.
How then interpret this recommendation of ISOGG even if they are not professional scientists?

• Sanger Sequencing
Examples of Sanger sequencing are the tests at the company ySeq and the Advanced Tests (SNP) at Family Tree DNA. STR testing is available, for instance, at Genebase and Family Tree DNA. Acceptable testing for this category consists of Sanger sequencing which targets a short segment of Y-DNA.

The objective of the ISOGG Tree at this time is to include all SNPs that arose prior to about the year 1500 C.E. This guideline may be measured through STR diversity or alternative evidence.

Where a new terminal subgroup is being added, STR marker results or other evidence des-cribed below for two men with the new SNP are needed.

1 STR Diversity
To be accepted the SNP must be observed in at least two individuals and must meet the STR diversity requirement. A SNP that does not meet this requirement will be classified as a Pri-vate SNP (see definition above).

The STR diversity requirement is met if the following conditions are satisfied:
a. If the SNP is a Non-Terminal Branch SNP, no further proof of diversity is required.
b. Genetic distance is calculated using the Infinite Alleles Model (IAM). A marker for which there is a null value in one sample must be discarded from the calculations. Otherwise, most laboratories use the IAM.
c. All markers tested by both individuals must be compared.
d. If 74 markers (or fewer) are compared, the minimum genetic distance to meet the di-versity requirement is 5.
e. If 75 (or more) markers are compared, the diversity requirement is a minimum of 7%, computed by dividing the genetic distance by the number of markers compared, and rounding to the nearest integer value.

Alternative Evidence
If the submitter can otherwise provide evidence that the common ancestor of the two samples can be reasonably expected to have lived more than 500 years ago, this will also be considered.

Each research lab had their own sample data-set and their own Y-SNP identifiers, and their own Y-phylogenies. The ISOGG's tree began as an effort to compile those various Y-phylogenies and Y-SNP identifiers together and reconcile them with each other. However these trees and variants were generally biased towards anonymous samples from specific population groups.

Since ISOGG has a genealogical focus, there is an ongoing debate among the haplogroup project experts to make ISOGG tree genealogically relevant, or at least able to bridge the gap between anthropological relevancy and genealogical relevancy. Unfortunately, that realm has seldom been explored by professional population geneticists, and as such has fallen into the hands of citizen scientists and amateurs.

The current recommendation has been set after extensive discussion by the haplogroup experts appointed to maintain and review the ISOGG's tree, and is periodically reviewed and debated as new testing technologies become available. Since none of these experts have first-hand experience using these technologies themselves in a professional laboratory setting, these recommendations are often reviewed and debated - by those same citizen scientists and amateurs.

(I absolutely do not mean to knock against those involved in citizen science, since on several occasions citizen scientists have noted patterns and inconsistencies in professionally acquired research that eventually were invalidated by subsequent research.)

The STR diversity requirement was derived from a controversial interpretation of p>0.05 (https://en.wikipedia.org/wiki/P-value) used by the [now-defunct?] Y-Chromosome Consortium, on which the ISOGG Y-tree was initially derived from. The implied interpretation was that diversity within a set of haplotypes belonging to any particular haplogroup should vary by at least 5% of the markers used by the set of haplotypes, in order to be statistically relevant in the phylogeny. As Y-STRs are so much more volatile (and prone to back-mutations) than Y-SNPs, it is hoped that this particular recommendation will eventually fall into obsolescence, if it hasn't already.

Cofgene
02-18-2016, 12:49 AM
If Neither Katherine, Alice, or Ray are professional scientists themselves, on what basis and who develops the scientific and technical recommendations like the one below?

SNIP
The STR diversity requirement is met if the following conditions are satisfied:
a. If the SNP is a Non-Terminal Branch SNP, no further proof of diversity is required.


The criteria can also be internally inconsistent. For this whoopsie I have SNPs which are non-terminal and placed within the last 300 years. I obviously don't meet the time or STR diversity requirements.

The section that was presented is an example of the technical bias and technical inconsistency present in the listing criteria. It specifically calls out Sanger sequencing as OK then in later sections they rag on forever about how NGS doesn't rate and you need to do analysis contortions and meet specified criteria determined by "intution" to get a NGS result in acceptable form. The problem with what they briefly mention about Sanger is that what YSEQ practices is NOT the Sanger that was done for the genome sequencing efforts. YSEQ does single reads. Genomic Sanger sequencing worked off of contigs and was the normal assembly of multiple reads. There are no listed quality criteria, nor personnel training requirements presented for what it takes to do single read Sanger tests. How do we know that FTDNA and YSEQ follow the same analysis protocol for single Sanger reads? There are quality levels and interpretation 'rules' which apply in specific cases for Sanger based results. None of that is mentioned. Just that Sanger is blessed as being the almighty reference standard way of doing things.

Note that the testing recommendations coming from YSEQ are dependent on what they can do with single read operations. There are a good number of variants that are testable using protocols other than what YSEQ offers. Yes it will cost you more but they can be used to verify some of those that YSEQ won't touch. As a community we need to stop following all that YSEQ states as being "testable" and get more familiar with using other technologies.

As Vince indicates the standards have been driven by a very small number of non-scientist individuals. Discussions to improve them and the quality of the listing processes are banned by the ISOGG leader in their forums. Whenever the current round of discussion censorship by the ISOGG leader on y-tree topics ends we will once again go back to redraw and improve the listing criteria. Until then the y-tree remains an interesting resource but not one you should place absolute faith in.

grouza31
02-18-2016, 11:56 AM
The criteria can also be internally inconsistent. For this whoopsie I have SNPs which are non-terminal and placed within the last 300 years. I obviously don't meet the time or STR diversity requirements.

The section that was presented is an example of the technical bias and technical inconsistency present in the listing criteria. It specifically calls out Sanger sequencing as OK then in later sections they rag on forever about how NGS doesn't rate and you need to do analysis contortions and meet specified criteria determined by "intution" to get a NGS result in acceptable form. The problem with what they briefly mention about Sanger is that what YSEQ practices is NOT the Sanger that was done for the genome sequencing efforts. YSEQ does single reads. Genomic Sanger sequencing worked off of contigs and was the normal assembly of multiple reads. There are no listed quality criteria, nor personnel training requirements presented for what it takes to do single read Sanger tests. How do we know that FTDNA and YSEQ follow the same analysis protocol for single Sanger reads? There are quality levels and interpretation 'rules' which apply in specific cases for Sanger based results. None of that is mentioned. Just that Sanger is blessed as being the almighty reference standard way of doing things.

Note that the testing recommendations coming from YSEQ are dependent on what they can do with single read operations. There are a good number of variants that are testable using protocols other than what YSEQ offers. Yes it will cost you more but they can be used to verify some of those that YSEQ won't touch. As a community we need to stop following all that YSEQ states as being "testable" and get more familiar with using other technologies.

As Vince indicates the standards have been driven by a very small number of non-scientist individuals. Discussions to improve them and the quality of the listing processes are banned by the ISOGG leader in their forums. Whenever the current round of discussion censorship by the ISOGG leader on y-tree topics ends we will once again go back to redraw and improve the listing criteria. Until then the y-tree remains an interesting resource but not one you should place absolute faith in.

If I understand, it does not currently exist:
- A recognized standardized technical analysis and accepted by everyone,
- A Methodology of reliable quality control of analytical results; or if it exists, it is not used by everyone,
- A Standardized reliable criteria for the development of haplogroup tree; each makes his sauce in his corner.

Moreover the Yseq SNP tests presented to us as the most reliable in the world are not.

How to reassure people who expect a lot of these DNA and SNP tests?

Will all these experts one day agree on something in the interest of all of us who carry hope to understand and know our history, our roots, our past?

grouza31
02-19-2016, 09:59 AM
Let's take the positive side of all these tools and let’s return to the building of R1B tree.

In one of the other thread, Smal explained to me how to build the haploroup tree and how to determine ancestral and derived values of SNPs
“You need to compare individuals from different haplogroups or branches. Ancestral/derived stages for SNPs can be determined after the tree building. There are no needs to study skeletons.”

Below are my 10 SNPs that are consider (not by FTDNA) but by Smal, Yfull, VinceT as having ancestral value on M269 branch.
As I’m V88+, does anyone can tell me if these 10 SNPs with ancestral values on M269 branch are present at ancestral stage:
- Among all people in the V88+ group; or only among a part of V88+,
- Among all people of R1b group except the people of the M269 branch?,
- Among all the people of the R group except the people of the M269 branch?,
- Among all people of all haploroups (A, B, C, D, E, F, G, H, I, J, K, P, Q, R....) except the people of the branch M269?

SNP Ancestral Derived REF (according to VinceT)
CTS10834 T C C
CTS11468/FGC49/PF6520 G T T
CTS7400/FGC33/PF6469 T C C
CTS7659/FGC50/PF6470 C G G
CTS8591/FGC64/PF6477 A C C
CTS8665/FGC464/PF6479 T C C
CTS623/FGC37/PF6419 T G G
CTS11948/FGC54/PF6522 G A A
CTS12972/FGC52/PF6532 C G G
CTS5577/FGC38/PF6464 A C C
CTS9018/FGC188/PF6484 C T T

VinceT
02-19-2016, 03:23 PM
Let's take the positive side of all these tools and let’s return to the building of R1B tree.

In one of the other thread, Smal explained to me how to build the haploroup tree and how to determine ancestral and derived values of SNPs
“You need to compare individuals from different haplogroups or branches. Ancestral/derived stages for SNPs can be determined after the tree building. There are no needs to study skeletons.”

Below are my 10 SNPs that are consider (not by FTDNA) but by Smal, Yfull, VinceT as having ancestral value on M269 branch.
As I’m V88+, does anyone can tell me if these 10 SNPs with ancestral values on M269 branch are present at ancestral stage:
- Among all people in the V88+ group; or only among a part of V88+,
- Among all people of R1b group except the people of the M269 branch?,
- Among all the people of the R group except the people of the M269 branch?,
- Among all people of all haploroups (A, B, C, D, E, F, G, H, I, J, K, P, Q, R....) except the people of the branch M269?

SNP Ancestral Derived REF (according to VinceT)
CTS10834 T C C
CTS11468/FGC49/PF6520 G T T
CTS7400/FGC33/PF6469 T C C
CTS7659/FGC50/PF6470 C G G
CTS8591/FGC64/PF6477 A C C
CTS8665/FGC464/PF6479 T C C
CTS623/FGC37/PF6419 T G G
CTS11948/FGC54/PF6522 G A A
CTS12972/FGC52/PF6532 C G G
CTS5577/FGC38/PF6464 A C C
CTS9018/FGC188/PF6484 C T T

You are free to look up the positions for the above in the file: "FG1006A.realn.dedup.baseQrecal.vcf.gz", available at
https://drive.google.com/folderview?id=0B_8cnkmEcsIGbDR1cThwUFNiNnM&usp=sharing#list which belongs to haplogroup R-U106>FGC396>L199.1

It's quite a large file, and I recommend using tabix on a Linux machine, like so:

$ tabix FG1006A.realn.dedup.baseQrecal.vcf.gz Y:22796697-22796697
Y 22796697 . C . 163.23 . AN=2;DP=52;MQ=60.00;MQ0=0 GT:DP 0/0:52

Actually, never mind that. Here's the list of positions (let's call it "positions.tab"):

Y 6912992 CTS623/FGC37/PF6419
Y 16376495 CTS5577/FGC38/PF6464
Y 17461478 CTS7400/FGC33/PF6469
Y 17594966 CTS7659/FGC50/PF6470
Y 18095336 CTS8591/FGC64/PF6477
Y 18137831 CTS8665/FGC464/PF6479
Y 18617596 CTS9018/FGC188/PF6484
Y 22796697 CTS10834
Y 23124367 CTS11468/FGC49/PF6520
Y 23379254 CTS11948/FGC54/PF6522
Y 28771116 CTS12972/FGC52/PF6532

And by revising the tabix command a bit:

$ tabix FG1006A.realn.dedup.baseQrecal.vcf.gz -R positions.tab
Y 6912992 . G . 440.23 . AN=2;DP=163;MQ=59.86;MQ0=0 GT:DP 0/0:163
Y 16376495 . C . 212.23 . AN=2;DP=71;MQ=60.00;MQ0=0 GT:DP 0/0:71
Y 17461478 . C . 202.23 . AN=2;DP=71;MQ=60.00;MQ0=0 GT:DP 0/0:71
Y 17594966 . G . 142.23 . AN=2;DP=43;MQ=60.25;MQ0=0 GT:DP 0/0:43
Y 18095336 . C . 184.23 . AN=2;DP=60;MQ=60.18;MQ0=0 GT:DP 0/0:60
Y 18137831 . C . 729.23 . AN=2;DP=250;MQ=59.42;MQ0=0 GT:DP 0/0:249
Y 18617596 . T . 708.23 . AN=2;DP=249;MQ=59.83;MQ0=0 GT:DP 0/0:249
Y 22796697 . C . 163.23 . AN=2;DP=52;MQ=60.00;MQ0=0 GT:DP 0/0:52
Y 23124367 . T . 323.23 . AN=2;DP=110;MQ=59.79;MQ0=0 GT:DP 0/0:110
Y 23379254 . A . 269.23 . AN=2;DP=98;MQ=59.38;MQ0=0 GT:DP 0/0:98
Y 28771116 . G . 127.23 . AN=2;DP=36;MQ=59.36;MQ0=0 GT:DP 0/0:36

grouza31
02-25-2016, 04:57 PM
Thank you VinceT for your data.
However, it seems that FTDNA has reason with regard to the 10 SNPs that I mentioned. These SNPs that V88+ members have are not ancestral to the R-M269 branches and sub-branches but derived. Indeed, from my little investigation, there are members of the Yahoo R-Z255 group who possess these SNPs with the same alleles like the members of R-V88 group. For example: for the SNP CTS10834, the value of the allele, according to tests at YSEQ, is T among some members of the Yahoo group R-Z255. This is the same value T I also have with YSEQ and FTDNA tests while I'm R-V88 group.
If FTDNA still say that these 10 SNPs to be found among some members of the group R-V88 are SNPs that define the R-M269 branch and sub-branches, it is that FTDNA has data showing many people of R-M269 and sub-branches having identical alleles like members of R-V88.

These tests clearly confirm that one could be both V88+ and M269+
A case that contested Small was already presented by FTDNA.

grouza31
03-29-2016, 09:21 AM
Let’s come back to the building of the Y haplotree by ISOGG, FTDNA, Yfull, Smal and others and the rules governing the establishment of these trees.
I read recently scientific papers on genetics of population.
Here is the summary :
1) - In a given population, the alleles of a given SNP are distributed randomly among different individuals of this population. For example, if we take the case of a population group in Europe and considering the SNP M269, there will be individuals who will randomly have allele T and other allele C.
2)- In a given and large population, the allele of a given SNP is set (fixed) only in 2 cases out of 20.

3) The SNPS that distinguish population groups are only those whose allele frequencies vary greatly from one population group to another. These represent only 10 % of the total SNPS.

From these observations 1, 2 and 3, how do you arrived (Smal, YFull, ISOGG, FTDNA and others) to build Y haplotree? Why do you consider a single allele of SNP (the derived allele) to establish a branch if multiple alleles of the same SNP may be present randomly in the same population with the same background, the same origin and living in the same territory?

YFull, how do you calculate the TMRCA if you consider only a single allele (derived) of SNPs?

smal
03-29-2016, 01:29 PM
Hope you understand the difference between patrilineal (Y chromosome) and autosomal inheritance. You are talking about autosomal inheritance.

grouza31
03-29-2016, 01:41 PM
Hope you understand the difference between patrilineal (Y chromosome) and autosomal inheritance. You are talking about autosomal inheritance.

Smal, I'm not talking about autosomal inheritance. I'm talking about Y chromosome SNPS.

grouza31
03-29-2016, 03:07 PM
Small, looks a bit this excerpt from the registration of M269 SNP (rs9786153) in dbSNP. What do you think about the frequency of alleles T and C of this SNP in various populations (in Europe, Africa, America, ..)?
8489

grouza31
03-29-2016, 03:37 PM
Look also to the frequency of alleles T and C of the SNP CTS10834(RS9785897) in various populations8491

ArmandoR1b
03-29-2016, 10:58 PM
Small, looks a bit this excerpt from the registration of M269 SNP (rs9786153) in dbSNP. What do you think about the frequency of alleles T and C of this SNP in various populations (in Europe, Africa, America, ..)?
8489


Look also to the frequency of alleles T and C of the SNP CTS10834(RS9785897) in various populations8491

In order for M269 SNP (rs9786153) and SNP CTS10834(RS9785897) derived result of C to be relevant almost all of the upstream SNPs also need to be derived. Finding descendants of people that are derived for M269 SNP (rs9786153) and SNP CTS10834(RS9785897) and all of the upstream SNPs in far off lands simply means that their ancestors traveled far between the time the SNPs first appeared in a person and the time the person was born n the far off land. So using the populations in the context you are using them have no relevance for the building of the Y haplotree. People that have T for M269 SNP (rs9786153) and SNP CTS10834(RS9785897) just means they are ancestral and is what is expected to be found in anyone that is not in the R1b haplogroup or in a different branch of R1b.

grouza31
03-30-2016, 08:35 AM
People that have T for M269 SNP (rs9786153) and SNP CTS10834(RS9785897) just means they are ancestral and is what is expected to be found in anyone that is not in the R1b haplogroup or in a different branch of R1b.
What you write is not true. There are people in R-Z255 group who have C allele for M269 and T allele for CTS10834 (tested by YSEQ).
So how you interpret these results? R-Z255 is far dowstream to M269. We must not forget that there are also people who are both V88 + and M269 + (tested by FTDNA).
One of the main recommendations when registered Y SNPs dbSNP is to provide the frequencies of different alleles of this SNP in different populations.

grouza31
03-30-2016, 09:07 AM
If we take the case of the SNP V88. This SNP was recorded at dbSNP without providing the frequency distribution of alleles C and T in different populations worldwide.
On what basis we could write for years and still continues to do that R1b in Africa and the Middle East is dominated by the branch V88+ ie the T allele and in Europe it is the M269 branch?
These allegations baseless led to the fact for a long time and in genetic projects, we tested only V88 marker in Africa and the Middle East and M269 marker in Europe. This led to classify many people in Europe as M269+ while they are actually V88+.
Note that currently the oldest human skeleton found in the world and tested V88+ is in Europe and not in Middle East or Africa

smal
03-30-2016, 09:16 AM
There are several reasons for such discrepancies:
1. tech errors (main reason)
2. recurrent and back mutations. In most cases they can be found among private mutations. Sometimes they can form new subclades.

That is why it is better to test several SNPs to define haplogroup correctly. The NGS test (or SNP Pack) will be the best choice.

grouza31
03-30-2016, 09:53 AM
There are several reasons for such discrepancies:
1. tech errors (main reason)
2. recurrent and back mutations. In most cases they can be found among private mutation. Sometimes they can form new subclades.

That is why it is better to test several SNPs to define haplogroup correctly. The NGS test (or SNP Pack) will be the best choice.
Many thanks Smal.
What you say in 2. confirm the observations in population genetics. Recurrent and back mutations can lead to a random distribution of alleles of a given SNP in the same population.

grouza31
03-30-2016, 10:07 AM
Smal,
Can you tell me if the SNPS concerned by recurrent and back mutation are clearly indicated on Y haplotree ?
Can you tell me if we take into account these recurrent and back mutations in the estimation of the TMRCA as they can change the value of the SNP allele?

smal
03-30-2016, 11:12 AM
Recurrent and back mutations can lead to a random distribution of alleles of a given SNP in the same population.

It is highly unlikely. The mutation rate of these SNPs is very low to change frequencies in population. The effect of migrations is much more significant.

smal
03-30-2016, 11:19 AM
Can you tell me if the SNPS concerned by recurrent and back mutation are clearly indicated on Y haplotree ?

They are listed as SNP.1, SNP.2, SNP.3, or SNP! (back mutation)


Can you tell me if we take into account these recurrent and back mutations in the estimation of the TMRCA as they can change the value of the SNP allele?
Yes

ArmandoR1b
03-30-2016, 12:12 PM
What you write is not true. There are people in R-Z255 group who have C allele for M269 and T allele for CTS10834 (tested by YSEQ).
So how you interpret these results? R-Z255 is far dowstream to M269. We must not forget that there are also people who are both V88 + and M269 + (tested by FTDNA).
One of the main recommendations when registered Y SNPs dbSNP is to provide the frequencies of different alleles of this SNP in different populations.
All of the people in the examples you are providing need to have all of the upstream SNPs tested. If their upstream SNPs don't match the branching into one of the SNPs you mentioned then they do not belong to the branch of that specific SNP. This is very simple concept.

grouza31
03-30-2016, 02:53 PM
It is highly unlikely. The mutation rate of these SNPs is very low to change frequencies in population. The effect of migrations is much more significant.

It depends on the type of SNP. Some SNPs are changing more rapidly than others.
Looks a bit this extract and tell me what you think :

Reference Y chromosome nuclear (paternal line) whole molecule mutation rate (ysnpHMR60meg):
For this calculation M is not small but is instead quite large, i.e., in the order of 58,000,000. Thus if we use the simplification method I described above for the smaller marker counts, we get a large error in the result. Using the simplication method above we would get a Y chromosome (haplotype) mutation rate, .00000002 x 58,000,000 nucleotides (markers) in the molecule if completely sequenced and tested, equal to 1.2 per Y chromosome transmission event (birth of a new generation). The actual results using the equation described earlier in this paper of “probability of new haplotype = 1 – (1-mu)^M”, yields a Y chromosome (haplotype) mutation rate of 0.687. Thus if we could economically sequence the whole non-recombining region of the Y chromosome molecule we would expect to see a SNP change roughly about once in every generation. Because of the sheer number of nucleotides in the Y chromosome molecule's NRY region it is not likely to survive much more than one generation unchanged. Thus a Y chromosome would statistically survive about 1.45 generations without change based on the reference Y-DNA nucleotide mutation rate of 2x10^-8. And rounding things off to an integer, as a rule of thumb, I generally state in casual conversations that we can expect about one new SNP per generation. But finding that new SNP, would be a costly challenge with today's technology (as of 2005) ……...
http://www.kerchner.com/dnamutationrates.htm

Megalophias
03-30-2016, 06:40 PM
If we take the case of the SNP V88. This SNP was recorded at dbSNP without providing the frequency distribution of alleles C and T in different populations worldwide.
On what basis we could write for years and still continues to do that R1b in Africa and the Middle East is dominated by the branch V88+ ie the T allele and in Europe it is the M269 branch?
These allegations baseless led to the fact for a long time and in genetic projects, we tested only V88 marker in Africa and the Middle East and M269 marker in Europe. This led to classify many people in Europe as M269+ while they are actually V88+.
Note that currently the oldest human skeleton found in the world and tested V88+ is in Europe and not in Middle East or Africa

I don't know who "we" is, but researchers have actually tested people in Africa and Europe for M269 and V88 both, and yes Africa is mostly V88 (with some M269) and Europe is mostly M269 (with a little V88). The Middle East is not dominated by V88, it is mostly M269 as well. There are also lots of other R1b clades that are neither M269 nor V88.

ArmandoR1b
03-30-2016, 08:09 PM
It depends on the type of SNP. Some SNPs are changing more rapidly than others.
Looks a bit this extract and tell me what you think :

Reference Y chromosome nuclear (paternal line) whole molecule mutation rate (ysnpHMR60meg):
For this calculation M is not small but is instead quite large, i.e., in the order of 58,000,000. Thus if we use the simplification method I described above for the smaller marker counts, we get a large error in the result. Using the simplication method above we would get a Y chromosome (haplotype) mutation rate, .00000002 x 58,000,000 nucleotides (markers) in the molecule if completely sequenced and tested, equal to 1.2 per Y chromosome transmission event (birth of a new generation). The actual results using the equation described earlier in this paper of “probability of new haplotype = 1 – (1-mu)^M”, yields a Y chromosome (haplotype) mutation rate of 0.687. Thus if we could economically sequence the whole non-recombining region of the Y chromosome molecule we would expect to see a SNP change roughly about once in every generation. Because of the sheer number of nucleotides in the Y chromosome molecule's NRY region it is not likely to survive much more than one generation unchanged. Thus a Y chromosome would statistically survive about 1.45 generations without change based on the reference Y-DNA nucleotide mutation rate of 2x10^-8. And rounding things off to an integer, as a rule of thumb, I generally state in casual conversations that we can expect about one new SNP per generation. But finding that new SNP, would be a costly challenge with today's technology (as of 2005) ……...
http://www.kerchner.com/dnamutationrates.htm
You have taken that out of context. They are saying that there is a new SNP about every 1.45 generations. They are not saying that a specific SNP will change rapidly which is how you took it. There are millions of people that are derived for M269 and many other SNPs that are downstream from M269. The number of people that are derived for SNPs downstream from M269 but ancestral for M269 will be extremely small when technical errors are taken out of the equation. You are trying to point out a problem that is minuscule and making a mountain out of a molehill due to a misunderstanding on your part of the data.

grouza31
03-31-2016, 09:13 AM
Contrary to what is often assumed, recurrent back mutation of Y chromosome SNPs is very common and affect many SNPs with some five orders of magnitude higher than the base mutation rate.
In a given population, the number of individuals affected by recurrent back mutation of a given Y-SNP is related to frequency of derived allele in this population. I let you calculate the number of individuals involved in Europe for a SNP whose derived allele frequency is 70% in the population.
Please find here the summary of some studies :

1)-In their study published in 2013 in Forensic Science International Genetics and entitled “Multiple recurrent mutations at four human Y-chromosomal single nucleotide polymorphism sites in a 37 bp sequence tract on the ARSDP1 pseudogene”,
Niederstδtter H1, Berger B, Erhart D, Willuweit S, Geppert M, Gassner C, Schennach H, Parson W, Roewer L. wrote :

Y-SNPs are usually considered the result of unique single base substitutions (unique event polymorphisms) during human evolution. However, there are exceptions, even when barring "private" mutations. For instance, about 2% of the Y-SNP positions listed in the 2008 update of the Y chromosome consortium (YCC) Y-SNP tree show evidence of recurrent mutations (homoplasies). Further, nearly 3% of the Y-SNPs in the YCC 2008 tree are present in more Chan just one copy on the MSY

Here, we report findings for 17 samples from a population study comprising specimens from ∼3700 men living in Tyrol (Austria), indicating apparent homoplasic mutations at four Y-SNP loci on haplogroup R-M412/L51/S167, R-U152/S28, and L-M20 Y chromosomes. The affected Y-SNPs P41, P37, L202, and L203 mapped to a 37bp region on Yq11.21. Observing in multiple phylogenetic contexts up to four homoplasic mutations within such a short sequence tract is unlikely to result from a series of independent parallel mutations. Hence, we rather propose X-to-Y gene conversion as a more likely scenario. Practical implications arising from markers exhibiting paralogues on the Y chromosome or sites with a high propensity to recurrent mutation for database searches are addressed

2)-In their study entitled “The case of the unreliable SNP: recurrent back-mutation of Y-chromosomal marker P25 through gene conversion “
Susan M. Adams, Turi E. King, Elena Bosch* and Mark A. Jobling (Department of Genetics, University of Leicester, University Road, Leicester LE1 7RH, UK) observed recurrent back mutation of SNP P25 using samples from two western European regions ( 421 males originating from Great Britain and1000 males from the Iberian peninsula).

3)-Using an indirect method, a rate of 2.2 x 10-4 conversions per duplicated nucleotide per generation – some five orders of magnitude higher than the base mutation rate - has been estimated :
(S. Rozen, H. Skaletsky, J.D. Marszalek, P.J. Minx, H.S. Cordum, R.H.Waterston, R.K. Wilson, D.C. Page, Abundant gene conversion between arms of massive palindromes in human and ape Y chromosomes, Nature 423 (2003) 873-876)

lgmayka
03-31-2016, 02:39 PM
The affected Y-SNPs P41, P37, L202, and L203 mapped to a 37bp region on Yq11.21. Observing in multiple phylogenetic contexts up to four homoplasic mutations within such a short sequence tract is unlikely to result from a series of independent parallel mutations. Hence, we rather propose X-to-Y gene conversion as a more likely scenario.
Years ago, Thomas Krahn pointed out the possibility of X-to-Y copying. That is one reason his company Yseq is so strict about the Y chromosome locations for which he is willing to offer SNP tests--he avoids a Y region that is too similar to an X region (or to another Y region).

You can use this YFull search tool (https://yfull.com/search-snp-in-tree/)to see where (in the haplotree) YFull has found a given SNP.

Megalophias
03-31-2016, 04:03 PM
Contrary to what is often assumed, recurrent back mutation of Y chromosome SNPs is very common and affect many SNPs with some five orders of magnitude higher than the base mutation rate.
In a given population, the number of individuals affected by recurrent back mutation of a given Y-SNP is related to frequency of derived allele in this population. I let you calculate the number of individuals involved in Europe for a SNP whose derived allele frequency is 70% in the population.
And are M269 or V88 among these unstable mutations?

Since there are many millions of men with M269, naturally we expect a few men to have recurrent V88+ or a back-mutation to M269-. As smal said above, these are rare private lineages.

Honestly I don't know what you are trying to say here. Are you suggesting that there is a major clade of M269+ V88+ men that has somehow been missed? Or that M269 or V88 is prone to recurrent mutation and that this has been missed?

ArmandoR1b
03-31-2016, 04:45 PM
Contrary to what is often assumed, recurrent back mutation of Y chromosome SNPs is very common and affect many SNPs with some five orders of magnitude higher than the base mutation rate.
In a given population, the number of individuals affected by recurrent back mutation of a given Y-SNP is related to frequency of derived allele in this population. I let you calculate the number of individuals involved in Europe for a SNP whose derived allele frequency is 70% in the population.
Please find here the summary of some studies :

1)-In their study published in 2013 in Forensic Science International Genetics and entitled “Multiple recurrent mutations at four human Y-chromosomal single nucleotide polymorphism sites in a 37 bp sequence tract on the ARSDP1 pseudogene”,
Niederstδtter H1, Berger B, Erhart D, Willuweit S, Geppert M, Gassner C, Schennach H, Parson W, Roewer L. wrote :

Y-SNPs are usually considered the result of unique single base substitutions (unique event polymorphisms) during human evolution. However, there are exceptions, even when barring "private" mutations. For instance, about 2% of the Y-SNP positions listed in the 2008 update of the Y chromosome consortium (YCC) Y-SNP tree show evidence of recurrent mutations (homoplasies). Further, nearly 3% of the Y-SNPs in the YCC 2008 tree are present in more Chan just one copy on the MSY

Here, we report findings for 17 samples from a population study comprising specimens from ∼3700 men living in Tyrol (Austria), indicating apparent homoplasic mutations at four Y-SNP loci on haplogroup R-M412/L51/S167, R-U152/S28, and L-M20 Y chromosomes. The affected Y-SNPs P41, P37, L202, and L203 mapped to a 37bp region on Yq11.21. Observing in multiple phylogenetic contexts up to four homoplasic mutations within such a short sequence tract is unlikely to result from a series of independent parallel mutations. Hence, we rather propose X-to-Y gene conversion as a more likely scenario. Practical implications arising from markers exhibiting paralogues on the Y chromosome or sites with a high propensity to recurrent mutation for database searches are addressed

2)-In their study entitled “The case of the unreliable SNP: recurrent back-mutation of Y-chromosomal marker P25 through gene conversion “
Susan M. Adams, Turi E. King, Elena Bosch* and Mark A. Jobling (Department of Genetics, University of Leicester, University Road, Leicester LE1 7RH, UK) observed recurrent back mutation of SNP P25 using samples from two western European regions ( 421 males originating from Great Britain and1000 males from the Iberian peninsula).

3)-Using an indirect method, a rate of 2.2 x 10-4 conversions per duplicated nucleotide per generation – some five orders of magnitude higher than the base mutation rate - has been estimated :
(S. Rozen, H. Skaletsky, J.D. Marszalek, P.J. Minx, H.S. Cordum, R.H.Waterston, R.K. Wilson, D.C. Page, Abundant gene conversion between arms of massive palindromes in human and ape Y chromosomes, Nature 423 (2003) 873-876)
You are again trying to apply a broad statements to specific SNPs which should not be done. M269 and V88 aren't recurrent in 2%-3% of the population. Even if they were it wouldn't change the factual statement that "Paragroup R1b1* and haplogroup R1b1c-V88 are found most frequently in SW Asia and Africa. The African examples are almost entirely within R1b1c and are associated with the spread of Chadic languages" when used in the context of the 2011-2014 ISOGG tree.

grouza31
04-01-2016, 10:11 AM
.. The African examples are almost entirely within R1b1c and are associated with the spread of Chadic languages[/I]" when used in the context of the 2011-2014 ISOGG tree.

What is Chad languages for you?
Some people have written it without any basis and you rewrite it without any verification. Originally, many people in Africa (north to south, east to west) who are V88+ does not speak any Chadic language. Nobody knows their original language. When members of the V88 group arrived in Africa (whether in North Africa or east Africa), they quickly mixed with different local populations and were quickly abandoned their original language to local languages. They were a minority. Many of the members V88+ in Central Africa come from North Africa. From Central Africa, they have spread to the West Africa and Equatorial Africa often for commercial reasons or religious reasons. By the 12 th century, many members of the V88 group in Central Africa have converted to Islam and are themselves the ones of many who spread Islam in the rest of West Africa and Equatorial Africa. They have not spread the languages they spoke in Central Africa to other regions because, first, they are a minority compared to the rest of the local population they meet and also the basic rule in these V88+ members is to adopt quickly the languages and customs of the local people.
Other members of the V88 + group who were in East Africa have spread to South Africa and are still a minority compared to the rest.
Note that the members of the V88 group in central Africa are a minority (less than 1%) compared to the rest of the population even if they are found many in some villages.
Among the speakers of Chadic language, we mostly encounter the members of E1b branch than V88.

Cofgene
04-01-2016, 10:43 AM
Years ago, Thomas Krahn pointed out the possibility of X-to-Y copying. That is one reason his company Yseq is so strict about the Y chromosome locations for which he is willing to offer SNP tests--he avoids a Y region that is too similar to an X region (or to another Y region).

You can use this YFull search tool (https://yfull.com/search-snp-in-tree/)to see where (in the haplotree) YFull has found a given SNP.

The frequency of occurrence of this X-to-Y process needs to be determined and if present assigned to specific haplogroup branches. There are technical ways to evaluate and address the problem of the similar regions that are not part of the testing toolkit available to the general community.

grouza31
04-01-2016, 12:17 PM
And are M269 or V88 among these unstable mutations?

Since there are many millions of men with M269, naturally we expect a few men to have recurrent V88+ or a back-mutation to M269-. As smal said above, these are rare private lineages.

Honestly I don't know what you are trying to say here. Are you suggesting that there is a major clade of M269+ V88+ men that has somehow been missed? Or that M269 or V88 is prone to recurrent mutation and that this has been missed?

At this rate, the privates SNPS will outnumber normal SNPS.
When there are mutations that disrupt the established dogma, and if people do not seek or do not want to understand what is happening, they are classified as private. That way we talk about no more of them.
Try to cross your data of BIY or FGC Yelite with others. You will see that there are tons of SNPs that are found among several people who belong to different branches. In my case who is in the sub-branches of V88, I have many tens of SNPs found in the sub branches of M269.
The last SNP I just found today is FGC38304 (14722624-G-A).This SNPs with the same allele as me is found in a person of R-S16864 group which is well below R-P312. As usual, people will reply that this may be a test error or a private mutation. It's over, we close the door and talk no more.

I remind you of this SNP that Smal placed in a branch of V88. Note that I do not question the work of Smal. He is a great expert who did a good job. But I need more explanation of the phenomena.

ArmandoR1b
04-01-2016, 12:34 PM
What is Chad languages for you?
I go by what linguists consider Chadic languages https://en.wikipedia.org/wiki/Chadic_languages


Some people have written it without any basis and you rewrite it without any verification. Originally, many people in Africa (north to south, east to west) who are V88+ does not speak any Chadic language. Nobody knows their original language. When members of the V88 group arrived in Africa (whether in North Africa or east Africa), they quickly mixed with different local populations and were quickly abandoned their original language to local languages. They were a minority. Many of the members V88+ in Central Africa come from North Africa. From Central Africa, they have spread to the West Africa and Equatorial Africa often for commercial reasons or religious reasons. By the 12 th century, many members of the V88 group in Central Africa have converted to Islam and are themselves the ones of many who spread Islam in the rest of West Africa and Equatorial Africa. They have not spread the languages they spoke in Central Africa to other regions because, first, they are a minority compared to the rest of the local population they meet and also the basic rule in these V88+ members is to adopt quickly the languages and customs of the local people.
Other members of the V88 + group who were in East Africa have spread to South Africa and are still a minority compared to the rest.
Note that the members of the V88 group in central Africa are a minority (less than 1%) compared to the rest of the population even if they are found many in some villages.
Among the speakers of Chadic language, we mostly encounter the members of E1b branch than V88.
It doesn't matter that they are a minority overall. There is no where else in the world that V88 can be found at such high levels within subpopulations http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987365/table/tbl1/

ArmandoR1b
04-01-2016, 12:44 PM
At this rate, the privates SNPS will outnumber normal SNPS.
When there are mutations that disrupt the established dogma, and if people do not seek or do not want to understand what is happening, they are classified as private. That way we talk about no more of them.
People definitely want to talk about private SNPs with their close relatives. Why would I care which private SNPs you have?


Try to cross your data of BIY or FGC Yelite with others. You will see that there are tons of SNPs that are found among several people who belong to different branches.
Once again you are ignoring an unwritten cardinal rule about SNPs. If you are ancestral for upstream SNPs in the branch the SNP was initially defined in and the recurrent SNP isn't also shared by other people in your branch then the recurrent SNP is irrelevant.


In my case who is in the sub-branches of V88, I have many tens of SNPs found in the sub branches of M269.
The last SNP I just found today is FGC38304 (14722624-G-A).This SNPs with the same allele as me is found in a person of R-S16864 group which is well below R-P312. As usual, people will reply that this may be a test error or a private mutation. It's over, we close the door and talk no more.
We talk no more unless you can find a lot of other people in the same branch as you with the same mutation. Otherwise it is irrelevant.

grouza31
04-01-2016, 02:09 PM
People definitely want to talk about private SNPs with their close relatives. Why would I care which private SNPs you have?


Once again you are ignoring an unwritten cardinal rule about SNPs. If you are ancestral for upstream SNPs in the branch the SNP was initially defined in and the recurrent SNP isn't also shared by other people in your branch then the recurrent SNP is irrelevant.


We talk no more unless you can find a lot of other people in the same branch as you with the same mutation. Otherwise it is irrelevant.

When I speak, I bring you the scientific evidence. I give you references to scientific studies. You speak without the scientic evidence. You repeat what others often say. Bring me the evidence by scientific publications of what you say and I will adhere to what you say.

grouza31
04-01-2016, 02:51 PM
I go by what linguists consider Chadic languages https://en.wikipedia.org/wiki/Chadic_languages


It doesn't matter that they are a minority overall. There is no where else in the world that V88 can be found at such high levels within subpopulations http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987365/table/tbl1/

What you say is not true. There have been many criticisms of this study. In this study in Africa, was selected populations where we know there was a lot of mixing with people coming from the Middle East and Europe. On the other side for Europe and Asia we have no details about the different ethnic groups concerning by the study. It just says western Europeans, North Western Europeans, Central Europeans, Italians, Corsicans, ...
By doing so, it distorts results. Make the same studies by taking the same number of people like in Europe or Asia and make blocks as: North Africa, West Africa, south africa and central africa, Nigerians, algerians, .. you will see that level of V88 within de subpopulations will not be the same.

ArmandoR1b
04-02-2016, 02:10 PM
When I speak, I bring you the scientific evidence. I give you references to scientific studies. You speak without the scientic evidence. You repeat what others often say. Bring me the evidence by scientific publications of what you say and I will adhere to what you say.
When you speak you speak with erroneous conclusions because you think the scientific evidence supports your ideas.

ArmandoR1b
04-02-2016, 02:47 PM
What you say is not true. There have been many criticisms of this study. In this study in Africa, was selected populations where we know there was a lot of mixing with people coming from the Middle East and Europe. On the other side for Europe and Asia we have no details about the different ethnic groups concerning by the study. It just says western Europeans, North Western Europeans, Central Europeans, Italians, Corsicans, ...
By doing so, it distorts results. Make the same studies by taking the same number of people like in Europe or Asia and make blocks as: North Africa, West Africa, south africa and central africa, Nigerians, algerians, .. you will see that level of V88 within de subpopulations will not be the same.
YFull has a database of the 1.000 Genomes samples and of paying YFull customers of which most are from U.S and Europe. There are 1217 R1b people in YFull. 1203 are M269 and all of them are positive for downstream SNPs which means all of them are positive for upstream SNPs from the point of their terminal SNP with at least one other person. Only 10 are V88. This goes to show that V88 is very rare in the large number of 1,000 Genomes samples and YFull customers. Of those 10 V88 people 2 are from Cagliari, which could be from Middle Eastern or African ancestry, 2 from Saudi Arabia, 1 Kuwait, 1 Peru which could be from which could be from Middle Eastern or African ancestry and 1 is from Nigeria. The pattern from YFull speaks for itself. M269 is much more common in Europe and V88 is much more common in the Middle East and Africa. If there were more Middle Eastern and African customers at YFull this would be even more evident.

grouza31
04-04-2016, 09:19 AM
YFull has a database of the 1.000 Genomes samples and of paying YFull customers of which most are from U.S and Europe. There are 1217 R1b people in YFull. 1203 are M269 and all of them are positive for downstream SNPs which means all of them are positive for upstream SNPs from the point of their terminal SNP with at least one other person. Only 10 are V88. This goes to show that V88 is very rare in the large number of 1,000 Genomes samples and YFull customers. Of those 10 V88 people 2 are from Cagliari, which could be from Middle Eastern or African ancestry, 2 from Saudi Arabia, 1 Kuwait, 1 Peru which could be from which could be from Middle Eastern or African ancestry and 1 is from Nigeria. The pattern from YFull speaks for itself. M269 is much more common in Europe and V88 is much more common in the Middle East and Africa. If there were more Middle Eastern and African customers at YFull this would be even more evident.

The only thing that you can say with the data from your project and that from 1000 genomes is that: among the total number of V88+ persons counted, the majority are from Africa or Middle East. You can not conclude at all that, because in your project and in that of the 100 genomes the majority of the V88+ persons are from Africa or from the Middle East, then V88+ is more frequent in Africa or Middle East than in Europe. Such a conlusion is totally false. I am going to prove it with a specific case.
Let us take the data concerning V88+ in Smal R1b-M343 (xM269) v3 tree.
According to Smal, these data are from the following projects: Francalassi and al, FTDNA, 1000 genomes, Human Genome Diversity, Trombetta and al, Karmin and al.
In these data, we count a total of 44 V88+ persons.
Among these 44 personns, 30 (68 %) are Sardinians ie Europeans (unless people consider Sardinia is not a part of Europe), approximately 9 % come from Africa, and 9 % come from the Middle East. As 68 % of the V88+ persons come from Europe, can I conclude that V88 is more frequent in Europe than in Africa or Middle East? My answer is NO, while it is YES if I consider your reasoning and that of most of people.
We must be careful of too hasty conclusions in projects.
The only way to have reliable conclusions, is having in every continent the allelic frequencies of V88 in populations.
But in the present circumstances, there is no such data.

Cofgene
04-04-2016, 10:44 AM
Once again you are ignoring an unwritten cardinal rule about SNPs. If you are ancestral for upstream SNPs in the branch the SNP was initially defined in and the recurrent SNP isn't also shared by other people in your branch then the recurrent SNP is irrelevant.


Cardinal rule? OK if a variant shows up every other generation. Not OK for occurrence where one has multiple levels between the recurrent appearances. The 'rule' comes derives from the technical issue of being able to track the occurrence. People are lazy and like to throw useful data out if it is hard to track.

Megalophias
04-04-2016, 02:51 PM
The only way to have reliable conclusions, is having in every continent the allelic frequencies of V88 in populations. But in the present circumstances, there is no such data.
There are loads of studies on Y chromosomes in populations from different continents. Why do you not take them into account?

grouza31
04-04-2016, 03:33 PM
There are loads of studies on Y chromosomes in populations from different continents. Why do you not take them into account?
I'm talking about V88 not M269. If you have studies with V88 C and T alleles frequencies distribution in populations from different continent, please provide them to us. I remind you that unlike M269, V88 was registred at dbSNP without allelic frequencies. Even if in 1000 genomes project, alleles frequencies are not available unless you have this data in your possession.
The calculation of allele frequencies is not limited to analysis of Y chromosome. There is much more work to do after and it requires a lot of time and money.

grouza31
04-04-2016, 03:52 PM
There are loads of studies on Y chromosomes in populations from different continents. Why do you not take them into account?
V88 data with a complete sequencing of Y chromosome coming from several sources and that are accessible are those published in the tree established by Smal. If I only look at these data, I conclude that V88 is more common in Europe than in Africa or Middle East with a percentage of 68% in Europe.
Is this true or false for you?

AJL
04-04-2016, 04:51 PM
The only thing that you can say with the data from your project and that from 1000 genomes is that: among the total number of V88+ persons counted, the majority are from Africa or Middle East.

No, that is untrue. We can also say that the proportion of Europeans tested is much larger than Africans and Middle Easterners.

What we are left with, then, is a very high proportion of V88 in areas of very low testing density, and a very low proportion in areas of high testing density. That is of statistical significance.

ADW_1981
04-04-2016, 05:07 PM
V88 data with a complete sequencing of Y chromosome coming from several sources and that are accessible are those published in the tree established by Smal. If I only look at these data, I conclude that V88 is more common in Europe than in Africa or Middle East with a percentage of 68% in Europe.
Is this true or false for you?

V88 or pre-V88 might have been a WHG lineage that was in Southern Europe and was absorbed by incoming farmers. We're talking 8000+ years ago, so it's difficult to say one way or another. Outside of Sardinia, V88 is more common is Middle Eastern, African, and Jewish groups. There are only a handful of non-Sardinian, non-Jewish European V88 members.

Megalophias
04-04-2016, 05:42 PM
I'm talking about V88 not M269. If you have studies with V88 C and T alleles frequencies distribution in populations from different continent, please provide them to us. I remind you that unlike M269, V88 was registred at dbSNP without allelic frequencies. Even if in 1000 genomes project, alleles frequencies are not available unless you have this data in your possession. The calculation of allele frequencies is not limited to analysis of Y chromosome. There is much more work to do after and it requires a lot of time and money.

V88 is a transition to T at position 4862861 on the Y chromosome (hg19). In any study if someone is reported as V88+ then they have T there, if V88- they presumably have the ancestral C. You only have one Y chromosome, so the frequency of the 4862861T allele is just the frequency of V88. What's so complicated about it?

Myres et al (2011), "A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe" is useful especially for Europe, to a lesser degree Asia, while Cruciani et al (2010) "Human Y-chromosome haplogroup R-V88: a paternal genetic record of early-mid Holocene trans-Saharan connections and the spread of Chadic languages" has an extensive but somewhat patchy set of Africans as well as Europeans and Asians. Bekada et al (2013) "Introducing the Algerian Mitochondrial DNA and Y-Chromosome Profiles into the North African Landscape" has extensive pooled samples from the literature for southern Europe, North Africa, and the Middle East. For more detailed information on the Middle East very useful studies are Grugni et al (2012) "Ancient Migratory Events in the Middle East: New Clues from the Y-Chromosome Variation of Modern Iranians"; Cinnioglu et al (2004), "Excavating Y-chromosome haplotype strata in Anatolia"; Al-Zahery et al (2011), "In search of the genetic footprints of Sumerians: a survey of Y-chromosome and mtDNA variation in the Marsh Arabs of Iraq"; Abu-Amero et al (2009), "Saudi Arabian Y-chromosome diversity and its relationship with nearby regions"; Cadenas et al (2007), "Y-chromosome diversity characterizes the Gulf of Oman"; Zalloua et al (2008), "Y-chromosomal diversity in Lebanon is structured by recent historical events"; Hovhannisyan et al (2014), "Different waves and directions of Neolithic migrations in the Armenian Highland". For Africa Wood et al (2005), "Contrasting patterns of Y chromosome and mtDNA variation in Africa: evidence for sex-biased demographic processes"; Berniell-Lee et al (2009), "Genetic and demographic implications of the Bantu Expansion: insights from human paternal lineages"; de Filippo et al (2011), "Y-chromosomal variation in Sub-Saharan Africa: insights into the history of Niger-Congo groups"; Rosa et al (2007), "Y-chromosomal diversity in the population of Guinea-Bissau: a multiethnic perspective"; Hassan et al (2008), "Y-chromosome variation among Sudanese: restricted gene flow, concordance with language, geography, and history"; Larmuseau et al (2015), "The paternal landscape along the Bight of Benin - testing regional representativeness of West-African population samples using Y-chromosomal markers"; Barbieri et al (2015), "Refining the Y chromosome phylogeny with southern African sequences".


V88 data with a complete sequencing of Y chromosome coming from several sources and that are accessible are those published in the tree established by Smal. If I only look at these data, I conclude that V88 is more common in Europe than in Africa or Middle East with a percentage of 68% in Europe.
You mean 68% of the members of this (highly geographically biased) sample came from Europe, not that the frequency of V88 in Europe is 68%. You are looking at a sample of only V88, which tells you absolutely nothing about the proportion of V88 in the source populations. That is not the same as what ArmandoR1b was doing, because he is looking at the Y Full tree which includes all varieties of Y haplogroups, not just V88 - of course that is only a crude and biased approximation, but it is okay in principle, while your counterexample is not.

Silesian
04-04-2016, 09:28 PM
When I speak, I bring you the scientific evidence. I give you references to scientific studies. You speak without the scientic evidence. You repeat what others often say. Bring me the evidence by scientific publications of what you say and I will adhere to what you say.
This is not a scientific publication just an observation using some helpful tools like K15-Eurogenes.
Some K15 results to compare various hypotheses. You can compare cluster of samples that are around [5000BCE+/- Like,ydna-H2-G2a2-I2a1-R1b1]
For example compare East Med and Red Sea levels ; I2a1b1 with Els Trocs Spain R1b-M415+, M343+, [L754 equivalent: L774/PF6245/YSC277+, PF1144+, V88 eqivalent: PF6376+], M478-, PF6399-, L265-, L150-, M269-, V35-, V69-
Spain_EN I0412 Els Trocs, Spain 5310-5206 calBCE 42.5 0.5 N1a1a1 I2a1b1[I]Red Sea and East Med 27.32%
Els Trocs, Spain 5178-5066 calBCE 42.5 0.5 pre-T2c1d2 R1b1Red Sea and East Med 12.96%
https://docs.google.com/spreadsheets/d/1xS830gSmb1QvPdrZqcF0YOdygLCIyDNJP5IXO-QychI/edit#gid=1351765170

grouza31
04-05-2016, 08:42 AM
You mean 68% of the members of this (highly geographically biased) sample came from Europe, not that the frequency of V88 in Europe is 68%. You are looking at a sample of only V88, which tells you absolutely nothing about the proportion of V88 in the source populations. That is not the same as what ArmandoR1b was doing, because he is looking at the Y Full tree which includes all varieties of Y haplogroups, not just V88 - of course that is only a crude and biased approximation, but it is okay in principle, while your counterexample is not.

This is exactly what ArmandoR1b has done at YFull. And it is also what many people did in genetic projects.
If I take all the haplogroups on the Smal tree, it does not alter the fact that Europe remains always majority with 49% for V88, Africa approximately 7% for V88 and Middle East approximately 7% for V88.
Now, if before calculating the percentages, I balance V88 count (from every continent and contained in the Smal tree) by the total number of all the inhabitants of every continent, I get the following result: From 100% of V88 worldwide, 90% comes from Europe, 6% from African, 3.7% from Asia.
Do not you think that it is necessary to be very careful when we treat statistical data and that it is necessary to avoid too hasty conclusions?

AJL
04-05-2016, 03:23 PM
Europe remains always majority

Because of hideous database bias, which everyone knows about. Statisticians use proportions rather than raw numbers to overcome precisely the ridiculous and illogical "brute force sheer number" approach that you are using. If we used your approach, we would have to assume there were very few East Asian or South Asian people in the world, because there are very few in the database.

Megalophias
04-05-2016, 03:38 PM
Do not you think that it is necessary to be very careful when we treat statistical data and that it is necessary to avoid too hasty conclusions?
Sure. But the information we have about overall frequencies of M269 and V88 is not based on a small number of private samples, but on extensive surveys in many countries carried out by academic researchers. So there's no point in quibbling about trees with a small number of selected samples.

I still don't understand what this entire conversation is about. Distribution of V88? How allele frequencies work in the context of Y chromosomes? What are you actually asking?

grouza31
04-05-2016, 03:42 PM
Because of hideous database bias, which everyone knows about. Statisticians use proportions rather than raw numbers to overcome precisely the ridiculous and illogical "brute force sheer number" approach that you are using. If we used your approach, we would have to assume there were very few East Asian or South Asian people in the world, because there are very few in the database.
If you had followed the discussions from the beginning you would never answer me like that. I've always said the percentages are false. I took this example to show people that if I am reasoning like them, if I do the same calculations as them but using data from other projects, it resulted in percentages, number which are totally opposite of what they say or write. In the current state of things, it cannot draw any conclusion on the distribution of the V88 worldwide. The data available on the V88 are too low to draw reliable conclusions.Therefore, everyone must stop making speculative conclusions about the distribution of the V88 worldwide. No one knows absolutely nothing about them. People need more and more data before.

AJL
04-05-2016, 03:46 PM
If you had followed the discussions from the beginning you would never answer me like that. I've always said the percentages are false. I took this example to show people that if I am reasoning like them, if I do the same calculations as them but using data from other projects, it resulted in percentages, number which are totally opposite of what they say or write. In the current state of things, it cannot draw any conclusion on the distribution of the V88 worldwide. The data available on the V88 are too low to draw reliable conclusions.Therefore, everyone must stop making speculative conclusions about the distribution of the V88 worldwide. No one knows absolutely nothing about them. People need more and more data before.

I did follow it, but like Megalophias I am struggling to understand what point you are trying to make. There's certainly enough data to show we have more then 3 standard deviations higher concentrations of V88 in Africans and Near Easterners, and a few southern Mediterranean people, than we do in all Europeans.

Gravetto-Danubian
04-05-2016, 09:14 PM
I'm sensing the crux of his arguement is that V88 is actually more frequent in Europe, thus is a ""European" lineage

VinceT
04-05-2016, 09:54 PM
I'm sensing the crux of his arguement is that V88 is actually more frequent in Europe, thus is a ""European" lineage

Which is complete rubbish. Compare sample frequencies for R1b-V88 in Northern Africa and Central Africa, with Europe, in Table 1 of Cruciani et al (2010), "Human Y chromosome haplogroup R-V88: a paternal genetic record of early mid Holocene trans-Saharan connections and the spread of Chadic languages" (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987365/).

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987365/table/tbl1/

grouza31
04-06-2016, 11:12 AM
Which is complete rubbish. Compare sample frequencies for R1b-V88 in Northern Africa and Central Africa, with Europe, in Table 1 of Cruciani et al (2010), "Human Y chromosome haplogroup R-V88: a paternal genetic record of early mid Holocene trans-Saharan connections and the spread of Chadic languages" (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987365/).

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987365/table/tbl1/

This is not the comparaison of V88 alleles frιquencies distribution in the inhabitants of Africa, Europe, Midle Esat. It's only the percentage calculation of people who are V88+ in his project. It's the same calculation I have done using the data of Smal but by balancing the data by total number of inhabitants of each contry.

grouza31
04-06-2016, 11:17 AM
Let’s go back to ISOGG, Smal, FTDNA, YFULL R1b tree.
I had noticed that V88+ members are positive for many SNPs that define haplogroup R-M269 and its subclasses. Here are some of these SNPs, but there are many others:
CTS10834, PF6520, PF6469, PF6470, PF6477, PF6479, CTS623, CTS11948 (FGC54 / PF6522) CTS12972 (FGC52 / PF6532) and 2 of R-P297: PF6464 and CTS9018 (FGC188 / PF6484).

Smal gave me the following response. This is also the response of VinceT :
You are not right, all R-V88 including R-Y7771 are negative for these SNPs.

CTS10834, 22796697, all R-V88 have ancestral “T”, all R-M269 have derived “C”
PF6520, 23124367, all R-V88 have ancestral “G”, all R-M269 have derived “T”
PF6469, 17461478, all R-V88 have ancestral “T”, all R-M269 have derived “C”
PF6470, 17594966, all R-V88 have ancestral “C”, all R-M269 have derived “G”
PF6477, 18095336, all R-V88 have ancestral “A”, all R-M269 have derived “C”
PF6479, 18137831, all R-V88 have ancestral “T”, all R-M269 have derived “C”
CTS623, 6912992, all R-V88 have ancestral “T”, all R-M269 have derived “G”
CTS11948, 23379254, all R-V88 have ancestral “G”, all R-M269 have derived “A”
CTS12972, 28771116, all R-V88 have ancestral “C”, all R-M269 have derived “G”
PF6464, 16376495, all R-V88 have ancestral “A”, all R-P297 have derived “C”
CTS9018, 18617596, all R-V88 have ancestral “C”, all R-P297 have derived “T”

After checking the data from more than 200 people from more than 25 haplogroups of the M269 branch and its sub-branches, I realized that this response is not at all true.
Here are some findings :
CTS10834, 22796697, all V88 have “T”, For R-M269: 100 have “T” and 102 others have “C”
PF6520, 23124367, all V88 have “G”, For R-M269: 58 have “G” and 106 others have “T”
PF6469, 17461478, all V88 have “T”, For R-M269: 88 have “T” and 105 others have “C”
PF6470, 17594966, all V88 have “C”, For R-M269: 88 have “C” and 115 others have “G”
PF6477, 18095336, all V88 have “A”, For R-M269: 88 have “A” and 105 others have “C”
PF6479, 18137831, all V88 have “T”, For R-M269: 88 have “T” and 105 others have “C”
CTS623, 6912992, all V88 have “T”, For R-M269: 114 have “T” and 102 others have “G”
PF6464, 16376495, all V88 have “A”, For R-M2269: 89 have “A” and 102 others have “C”

My question is simple: How can you explain that people who are V88+ are also positive for many SNPs that are used to define the M269 branch and its sub-branches?

(1) Is it a presence of recurrent back mutations of these SNPs? I had advanced this hypothesis but you all on this forum told me it's impossible. Some even told me that I understood nothing data. Do you always agree to exclude this hypothesis?
(2) The second hypothesis is: the V88 branch is a sub-branch of the M269 branch. This can helps to explain why FTDNA found people who are both M269+ and V88+. This can also explain why the ancestral haplogroups of many V88+ people (given by FTDNA) are M269 haplogroups. This can help to explain why my ancestral haplogroup is from Ireland.
(3) The third hypothesis is the existence of error at the test level. Personally, I exclude this hypothesis because the SNPs found in V88+ people and that are used to define M269 and its subclasses can be counted by many tens. If there is error on these SNPs, this error will also occur on all SNPs of all. It is the easier hypothesis that people emit often. I hope that now, you will not again serve us the same soup.

smal
04-06-2016, 12:58 PM
There is an error in the interpretation of ancestral/derived values for many markers in BigY data. Sequences are OK, but the interpretation is erroneous. I have sent a request. FTDNA say they know about this problem. I know... They know... Everything is OK, there's no rush.

grouza31
04-06-2016, 01:51 PM
There is an error in the interpretation of ancestral/derived values for many markers in BigY data. Sequences are OK, but the interpretation is erroneous. I have sent a request. FTDNA say they know about this problem. I know... They know... Everything is OK, there's no rush.

Smal, thanks for the information.
Although FTDNA change the interpretation of Ancestral/derived values, the problem will remain identical in all members of M269 and its subclasses. The SNPs in question become negative for V88 members but within M269, those who were positive will become negative thus ancestral and those who were negative will become positive thus derived. How these two groups (ancestral and derived) can they coexist in a same haplogroup in the same branch of M269?
Will you consider that those who are ancestral (almost half of the M269 members) will no longer be M269 and will be removed from this branch?

lgmayka
04-06-2016, 03:39 PM
After checking the data from more than 200 people from more than 25 haplogroups of the M269 branch and its sub-branches, I realized that this response is not at all true.
Here are some findings :
CTS10834, 22796697, all V88 have “T”, For R-M269: 100 have “T” and 102 others have “C”
PF6520, 23124367, all V88 have “G”, For R-M269: 58 have “G” and 106 others have “T”
PF6469, 17461478, all V88 have “T”, For R-M269: 88 have “T” and 105 others have “C”
PF6470, 17594966, all V88 have “C”, For R-M269: 88 have “C” and 115 others have “G”
PF6477, 18095336, all V88 have “A”, For R-M269: 88 have “A” and 105 others have “C”
PF6479, 18137831, all V88 have “T”, For R-M269: 88 have “T” and 105 others have “C”
CTS623, 6912992, all V88 have “T”, For R-M269: 114 have “T” and 102 others have “G”
PF6464, 16376495, all V88 have “A”, For R-M2269: 89 have “A” and 102 others have “C”

I took a look at the 129 members of the R-U52 group on YFull. Absolutely no group members were negative/ancestral for any of the SNPs you mention (except YF01457, who belongs to haplogroup J and joined the R1b-U152 group by mistake).

Where are you getting your bad data?

smal
04-06-2016, 04:37 PM
Where are you getting your bad data?

The problem is in CSV files, Y-Haplotree, and SNP project pages, not in BAM, VCF files, or YFull system.

For example,

R-M269- sample
SNP Name Derived? On Y-Tree? Reference Genotype Confidence
CTS10834 Yes (+) Yes C T High

R-M269+ sample
SNP Name Derived? On Y-Tree? Reference Genotype Confidence
CTS10834 No (-) Yes C C High

However, the opposite should be the case. The reference sequence is derived for this marker. The correct direction of the mutation should be CTS10834 (T->C).

razyn
04-06-2016, 05:22 PM
However, the opposite should be the case. The reference sequence is derived for this marker.
A couple of months ago, I started a thread that attempted to address this issue. I updated it today with several similarly problematic positions that I've run across more recently. http://www.anthrogenica.com/showthread.php?6425-BigY-Novel-Variants-winnow-your-list-to-make-it-useful

VinceT
04-06-2016, 06:13 PM
The problem is in CSV files, Y-Haplotree, and SNP project pages, not in BAM, VCF files, or YFull system.

For example,

R-M269- sample
SNP Name Derived? On Y-Tree? Reference Genotype Confidence
CTS10834 Yes (+) Yes C T High

R-M269+ sample
SNP Name Derived? On Y-Tree? Reference Genotype Confidence
CTS10834 No (-) Yes C C High

However, the opposite should be the case. The reference sequence is derived for this marker. The correct direction of the mutation should be CTS10834 (T->C).

Absolutely! The data analysts in R1b-U106 Project had noticed that FTDNA's interpretation of Big Y data was partially flawed ever since the first dozen Big Y tests were complete. DO NOT take FTDNA's interpretation at face value.

Musa
04-07-2016, 08:11 AM
I found the topic very interesting. Initially l was tested and confirmed +ve for M269(all the snps) with FTdna later after bigY I became +PF6289. Also, even now my brother at 37 panels is for +M269..What is going on with FTdna prediction??

grouza31
04-07-2016, 10:26 AM
Thank you all. As the BAM files are correct, I repeat again my question: what are you going to do now with people who are classified R-M269 and it’s subclasses, and after correction of the Ancestral/derived values by FTDNA will become negative (ancestral) for many SNPs that are used to define this M269 branch?
Are you going to get out them of this branch M269?

How YFULL has dealt with this problem? Even if YFULL does not have this problem of interpretation of Ancestral/Dervied values, the problem remains and will be the same as FTDNA after correction of its interpretation software.
If you keep these people in the M269 branch and its sub-branches, you have to recognize in this case that the V88 are also M269+.

grouza31
04-07-2016, 01:21 PM
I found the topic very interesting. Initially l was tested and confirmed +ve for M269(all the snps) with FTdna later after bigY I became +PF6289. Also, even now my brother at 37 panels is for +M269..What is going on with FTdna prediction??
Musa, you are in the same situation as many of V88+ more specifically those who are PF6289+. I have the impression that for these PF6289+people, there is no clear border with a lot of M269+ people because they share many SNPs (with the same alleles) at M269 level. We notice the same similarity between these two groups of people with YSTR test where they are often exact match.
When people classified as PF6289+ wondered why they match some people of M269+, was answered them Y STR tests are not very reliable and they need to do SNPs tests. But when they do the SNPs test and found the same similarities like with the Y STR test, people quickly classified the SNPS that connect the two groups (PF6289+ and some M269+) as private SNPs and they not take them into account in the development of the Y trees.
I had written to FTDNA on this same problem over a year ago. FTDNA had replied that we find the same similarity because PF6289+ and +M269 are the same group.

lgmayka
04-07-2016, 02:08 PM
Initially l was tested and confirmed +ve for M269(all the snps) with FTdna later after bigY I became +PF6289.
YFull places PF6289 on the exact same level as V88 (https://yfull.com/tree/R-V88/). Have you submitted your BAM file to YFull for analysis?

Megalophias
04-07-2016, 04:07 PM
If you keep these people in the M269 branch and its sub-branches, you have to recognize in this case that the V88 are also M269+.
Above you only showed ancestral SNPs shared between M269 and V88. V88 shares these same SNPs with everything - H and E and B and A00 and frigging chimpanzees. You can no more argue that V88 is a branch of M269 based on shared ancestral alleles than you can argue it is introgressed from gorillas.

smal
04-07-2016, 04:26 PM
grouza31, Big Y tested V88 people do not have any M269 equivalent markers, R-V88 and R-M269 are completly different branches.

However, there are some amount of samples from R-V88 or R-M73 that have false positive results for M269 marker (mostly from Y-HAP-Backbone test). They are most probably lab errors. For example, 306159 Mansour.

grouza31
04-07-2016, 06:58 PM
Above you only showed ancestral SNPs shared between M269 and V88. V88 shares these same SNPs with everything - H and E and B and A00 and frigging chimpanzees. You can no more argue that V88 is a branch of M269 based on shared ancestral alleles than you can argue it is introgressed from gorillas.

What you say is false. V88+ share many derived SNPs with M269+ people and these derived SNPs are on M269 and its subclasses. How can you also accept that in R-L150 or R-P311 or R-CTS4520 you have one part ancestral and other part derived for the same SNP? It is only you who can accept this.

Megalophias
04-07-2016, 07:18 PM
What you say is false. V88+ share many derived SNPs with M269+ people and these derived SNPs are on M269 and its subclasses. How can you also accept that in R-L150 or R-P311 or R-CTS4520 you have one part ancestral and other part derived for the same SNP? It is only you who can accept this.
Of course it shares many derived SNPs with M269, they are both R1b. But for V88 to be a branch of M269 it would have to have derived SNPs shared with M269 which are ancestral in M73 and L389. All you posted above was ancestral SNPs reported in M269 samples, which are relevant only within M269.

I think it's an error by FTDNA, because there are lots of other full sequencing studies that have analyzed lots of M269 individuals and none of them have mentioned any kind of problem with these SNPs. Anyway, I don't know or care much about FTDNA's testing and reporting, and you don't seem to be interested in anything else, so I am not going to be of further use in this conversation.

grouza31
04-07-2016, 07:56 PM
Above you only showed ancestral SNPs shared between M269 and V88. V88 shares these same SNPs with everything - H and E and B and A00 and frigging chimpanzees. You can no more argue that V88 is a branch of M269 based on shared ancestral alleles than you can argue it is introgressed from gorillas.

What you say is false. V88+ share many derived SNPs with M269+ people and these derived SNPs are on M269 and its subclasses. How can you also accept that in R-L150 or R-P311 or R-CTS4520 you have one part ancestral and other part derived for the same SNP? It is only you who can accept this.

ArmandoR1b
04-07-2016, 11:02 PM
What you say is false. V88+ share many derived SNPs with M269+ people and these derived SNPs are on M269 and its subclasses. How can you also accept that in R-L150 or R-P311 or R-CTS4520 you have one part ancestral and other part derived for the same SNP? It is only you who can accept this.
smal has already explained this to you. It isn't only Megalophias that accepts this but everyone here that has been arguing with you. Every single one of us agrees with Megalophias.

grouza31
04-08-2016, 08:51 AM
smal has already explained this to you. It isn't only Megalophias that accepts this but everyone here that has been arguing with you. Every single one of us agrees with Megalophias.

I accepted explanations of Smal concerning FTDNA lab errors on V88+.
The problem that I pointed increasingly concern members of M269+. May be that I am not explained very well. I present it to you again and excuse me to dwell on this subject.

There are between 30 and 40% of people who are currently classified as M269+ and who have their haplogroups in the M269 branch and its sub-branches. These 30-40% have ancestral value (after correction of FTDNA interpretation error as proposed by Smal and others) at many markers that are at the same level as the M269 marker. The ancestral value of some of them was confirmed by YSEQ test on some members of the Group R-Z255 (for example marker CTS10834).
Smal had explained to me that in a given branch (Br1) of Y haplotree, he indicates only markers with derived value. So the people who are on this branch (Br1) or on its sub-branches (BR1, Br2, ...Brn) are those who only have the markers with derived value at the main brach Br1.
Let’s forget V88+ topics.
My question was :are you still going to keep these 30-40% of people classified as being on M269 branch and its sub-branches who even after the correction of the errors of interpretation of FTDNA will have automatically ancestral value at some of their markers that are at the same level as M269 marker? Or will you consider this fact as also lab errors? Or are you going to delete these markers at M269 level?
Apparently according to Smal, the problem does not arise for M269+ people who took BIY test. Can I deduce that the 30 to 40% of contentious cases are all those who have taken single SNPs test and backbone test?

Further clarification: I have never said that V88 descended from M269. It was a hypothesis that I had issued to explain some of the bizarre facts encountered at the V88+ and M269+.
The purpose of these discussions is to reverse or confirm this hypothesis. In no case I try to be right. I'm looking rather convincing explanations of the facts.

grouza31
04-09-2016, 12:21 PM
YFull, thank you very much for your R1-V88 tree showing that (if I’m not mistaken) all currently known V88+ people (african, Midle Eastern, European) descended from an European man especially Italian.
The sample id:ERS256975 (in your tree) is a living example of this ancestor who may have many decsendants who are still in Europe not only in Africa and Midle East.

lgmayka
04-09-2016, 02:15 PM
YFull, thank you very much for your R1-V88 tree showing that (if I’m not mistaken) all currently known V88+ people (african, Midle Eastern, European) descended from an European man especially Italian.
Not exactly. For one thing, the three Italian entries (ERS256975, ERS256965, and ERS256961) are more specifically Sardinian research samples. And secondly, they aren't ancestors of the other entries, but more like siblings. (In a haplotree shaped like this, I often call them offshoots.)

But the haplotree does clearly show that:
- 10,000 years ago, R-V88 split into 3 different lineages that survive today. But 2 of them have (so far) only been found on Sardinia.
- The third lineage split into 2 surviving lineages about 7500 years ago, but one of those has been found (so far) only on Sardinia.

One possible interpretation of this data is that R-V88 has been in Europe or the Mediterranean for at least 10,000 years.

Of course, this haplotree is greatly skewed by the fact that Sardinian Y-DNA has been more heavily studied, per capita, than any other on the planet. :)

smal
04-09-2016, 03:20 PM
R-V88 split into 3 different lineages that survive today.

It is clear the quality of ERS256975 is not enough to be assigned to R-M18, R-Y7777, or R-V88*. You cannot say that it belongs to third line.

Megalophias
04-09-2016, 03:33 PM
Not exactly. For one thing, the three Italian entries (ERS256975, ERS256965, and ERS256961) are more specifically Sardinian research samples. And secondly, they aren't ancestors of the other entries, but more like siblings. (In a haplotree shaped like this, I often call them offshoots.)

But the haplotree does clearly show that:
- 10,000 years ago, R-V88 split into 3 different lineages that survive today. But 2 of them have (so far) only been found on Sardinia.
- The third lineage split into 2 surviving lineages about 7500 years ago, but one of those has been found (so far) only on Sardinia.
The sample currently in the V88* position is actually labelled "in progress"; in the original study all the Sardinian V88 fell cleanly into 2 clades, so I expect once all the dust is settled there will be 2 branches, one under PF6282 and the other under Y7777.

Also, M18 has previous been reported from Lebanon, so both branches are found in the Near East as well.

Silesian
04-09-2016, 03:43 PM
YFull, thank you very much for your R1-V88 tree showing that (if I’m not mistaken) all currently known V88+ people (african, Midle Eastern, European) descended from an European man especially Italian.
The sample id:ERS256975 (in your tree) is a living example of this ancestor who may have many decsendants who are still in Europe not only in Africa and Midle East.

http://sardinianpeople.blogspot.ca/2012/11/genetics-y-dna-haplogroups-distribution.html

It is becoming clear that we have a time challenge for Cagliari I-M26 I-L158 https://www.yfull.com/tree/I-L158/
R1b Has been found already in Els Trocs Neolithic Spain with different autosomal properties [Eurogenes K15] than Neolithic ydna I2,and now we have confirmed R1b-V88 Cagliari https://www.yfull.com/tree/R-V88/
R-V88Y7789/FGC21066 * PF6338 * FGC20993/Y7786+74 SNPsformed 17200 ybp, TMRCA 10000 ybpinfo
I-L158PF4069 * CTS7472/PF4025 * PF3875+139 SNPsformed 18500 ybp, TMRCA 11100 ybpinfo

Silesian
04-09-2016, 04:47 PM
http://sardinianpeople.blogspot.ca/2012/11/genetics-y-dna-haplogroups-distribution.html

It is becoming clear that we have a time challenge for Cagliari I-M26 I-L158 https://www.yfull.com/tree/I-L158/
R1b Has been found already in Els Trocs Neolithic Spain with different autosomal properties [Eurogenes K15] than Neolithic ydna I2,and now we have confirmed R1b-V88 Cagliari https://www.yfull.com/tree/R-V88/
R-V88Y7789/FGC21066 * PF6338 * FGC20993/Y7786+74 SNPsformed 17200 ybp, TMRCA 10000 ybpinfo
I-L158PF4069 * CTS7472/PF4025 * PF3875+139 SNPsformed 18500 ybp, TMRCA 11100 ybpinfo

Follow the trail of snps.

Also just a side point.The other old branch of ydna I found in Cagliari, Sardinia is I-S6635S6685 * PF3935 * PF3918+30 SNPsformed 15600 ybp, TMRCA 10200 ybpinfo
https://www.yfull.com/tree/I-L596/

I-L596 is Y-snp derived support from Barcin Turkey Anatolia[6500-6200 BCE Barcın]- http://www.nature.com/nature/journal/v528/n7583/full/nature16152.html
https://docs.google.com/spreadsheets/d/1Vjbp450AwI7R-Y9J1YGSm9FjJWu9s9lx1azjUJbS8hQ/edit#gid=1862162288

Probably explains why no R1b-V88 was not found in Mathieson et al 2015 portion of Barcin-Anatolia samples. While younger branches of R1b-v88 found among Saudi Samples. The difference in East Med and Red autosomal between Els Trocs R1b and I2 sample K15 runs.

http://eurogenes.blogspot.ca/2015/06/k8-results-for-selected-allentoft-et-al.html

https://docs.google.com/spreadsheets/d/1x8pm8sVcHqceiNFJMO082kxaBF5ePr4__bAK05VQRFw/edit#gid=1681484272

Sardinian samples 1705-1728 a mix of WHG and Near Eastern.

Musa
04-17-2016, 02:45 PM
Yes! I've done it. Now I belong to R-Y18458, a subclade of V88.

grouza31
04-17-2016, 09:12 PM
The sample currently in the V88* position is actually labelled "in progress"; in the original study all the Sardinian V88 fell cleanly into 2 clades, so I expect once all the dust is settled there will be 2 branches, one under PF6282 and the other under Y7777.

Also, M18 has previous been reported from Lebanon, so both branches are found in the Near East as well.
What do you exactly mean? Do you think PF6289 and Y7777 are two separate branches of V88 ? What is ironic is that many Y7777+ people (in YFULL tree) are also PF6289+

Megalophias
04-18-2016, 05:12 AM
What do you exactly mean? Do you think PF6289 and Y7777 are two separate branches of V88 ? What is ironic is that many Y7777+ people (in YFULL tree) are also PF6289+
PF6289 is equivalent to V88, as far as I know.

Francalacci's study of Sardinians found 2 branches: one is V35 - which is a sub-branch of Y7771 - the other doesn't have an official defining SNP, I called it PF6282 because that is the first SNP listed at that level in smal's tree. However, the frequency of M18 and V35 in previous studies of Sardinian Y DNA suggest that it is mostly if not entirely M18, at least in Sardinia, and confirming this Y Full has one of those Sardinians presently listed under M18 (which is a deletion rather than a SNP).

grouza31
04-18-2016, 08:59 PM
PF6289 is equivalent to V88, as far as I know.

Francalacci's study of Sardinians found 2 branches: one is V35 - which is a sub-branch of Y7771 - the other doesn't have an official defining SNP, I called it PF6282 because that is the first SNP listed at that level in smal's tree. However, the frequency of M18 and V35 in previous studies of Sardinian Y DNA suggest that it is mostly if not entirely M18, at least in Sardinia, and confirming this Y Full has one of those Sardinians presently listed under M18 (which is a deletion rather than a SNP).

What allele of M18 marker is found in sardinian V88+ people ? AA or del del?

Megalophias
04-18-2016, 10:44 PM
What allele of M18 marker is found in sardinian V88+ people ? AA or del del?
Both.

grouza31
04-19-2016, 08:50 AM
Both.

Do you know the M18 marker allele value of the sample id:ERS256965 classified by YFULL as under M18 branch? Is it AA or del del?

Megalophias
04-19-2016, 04:12 PM
Do you know the M18 marker allele value of the sample id:ERS256965 classified by YFULL as under M18 branch? Is it AA or del del?
I don't work for Y-Full, you'd have to ask them. I doubt they'd list it as M18 without checking but who knows.

OK, upon further checking I think I was wrong the first time - M18 is the AA insertion, so the ancestral state is listed as a deletion (confusingly).

grouza31
04-19-2016, 04:38 PM
I don't work for Y-Full, you'd have to ask them. I doubt they'd list it as M18 without checking but who knows.

OK, upon further checking I think I was wrong the first time - M18 is the AA insertion, so the ancestral state is listed as a deletion (confusingly).
As there are both M18 AA and M18 del del in sardinian V88+ people what can be the percentages of M18 AA and M18 del del

Megalophias
04-19-2016, 10:47 PM
You can Google this stuff you know!

From Francalacci et al 2013 (n=1200) out of 29 V88 men 11 (38%) had V35 and 18 (62%) had the other, presumably M18, clade, but this is only an indirect measure as M18 itself was not tested.
Low-Pass DNA Sequencing of 1200 Sardinians Reconstructs European Y-Chromosome Phylogeny (http://science.sciencemag.org/content/341/6145/565)
From Contu et al 2008 (n=376) 1.6% of males born in Cagliari, 4.9% of males born in Sorgono, and 0% of males born in Tempio were M18+. V88 was not tested for, there is also R1(xM17, M18, M269) which is probably V88+ M18-.
Y-Chromosome Based Evidence for Pre-Neolithic Origin of the Genetically Homogeneous but Diverse Sardinian Population: Inference for Association Scans (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0001430)

grouza31
04-20-2016, 09:14 PM
You can Google this stuff you know!

From Francalacci et al 2013 (n=1200) out of 29 V88 men 11 (38%) had V35 and 18 (62%) had the other, presumably M18, clade, but this is only an indirect measure as M18 itself was not tested.
Low-Pass DNA Sequencing of 1200 Sardinians Reconstructs European Y-Chromosome Phylogeny (http://science.sciencemag.org/content/341/6145/565)
From Contu et al 2008 (n=376) 1.6% of males born in Cagliari, 4.9% of males born in Sorgono, and 0% of males born in Tempio were M18+. V88 was not tested for, there is also R1(xM17, M18, M269) which is probably V88+ M18-.
Y-Chromosome Based Evidence for Pre-Neolithic Origin of the Genetically Homogeneous but Diverse Sardinian Population: Inference for Association Scans (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0001430)

Thanks.

grouza31
04-20-2016, 09:22 PM
When M18 was recorded at dbSNP , no ancestral value was specified . M18 del represented 99.86 % and 0.14 % for M18 AA . Does anyone can tell us on what basis it was decided later that del is the ancestral value of M18 . M18 AA is very rare even in Sardinia . Being very rare , on what basis it was written all over that the Sardinian V88+ is dominated by the M18 branch ?

Megalophias
04-21-2016, 02:10 AM
When M18 was recorded at dbSNP , no ancestral value was specified . M18 del represented 99.86 % and 0.14 % for M18 AA . Does anyone can tell us on what basis it was decided later that del is the ancestral value of M18 .
Because everyone with A, E, F, G, I, J, R1a, R1b-M269, etc has del there. Only a subset of people with V88 have AA. Therefore del is ancestral and AA is derived (and under V88). This is very very basic stuff.


Being very rare , on what basis it was written all over that the Sardinian V88+ is dominated by the M18 branch ?
It is a higher *proportion* of the rare haplogroup V88 in Sardinia than M18- is. Look at the Contu study I linked above, for instance.

grouza31
04-21-2016, 09:34 AM
Because everyone with A, E, F, G, I, J, R1a, R1b-M269, etc has del there. Only a subset of people with V88 have AA. Therefore del is ancestral and AA is derived (and under V88). This is very very basic stuff.


It is a higher *proportion* of the rare haplogroup V88 in Sardinia than M18- is. Look at the Contu study I linked above, for instance.
I read this study of Contu . There is nothing in this study that concluded that there are more V88 + M18 + than V88+ M18- in Sardinia . All currently available data show that the V88+ in Sardinia are more M18- . Smal does not recognize any M18+ in his haplotree and YFULL found only one.

grouza31
04-21-2016, 03:14 PM
Because everyone with A, E, F, G, I, J, R1a, R1b-M269, etc has del there. Only a subset of people with V88 have AA. Therefore del is ancestral and AA is derived (and under V88). This is very very basic stuff.


Are you sure that all haplogroups which you have cited have M18 del?
There are many who have T as M18 allele.

Megalophias
04-21-2016, 03:32 PM
T is the ancestral allele at 21733162. M18 is an insertion at that location, not a SNP which changes T into another base.


All currently available data show that the V88+ in Sardinia are more M18-
OK you are just wasting my time. I'm done.

grouza31
04-21-2016, 08:18 PM
T is the ancestral allele at 21733162. M18 is an insertion at that location, not a SNP which changes T into another base.


OK you are just wasting my time. I'm done.
I'm not wasting your time. Your explanations and your opinion is very important and very useful. I just wanted you to give us your opinion on the current data most coming from the exploitation of works of Francalacci by others persons and that can not hold all the existence of V88+ M18+ (case of haplotree of Smal and YFULL ) . In my particular case , FGC had given me as haplogroup match R -M18 after analysis and interpretation of my BigY Bam file . But Smal refuted this classification. Yfull also has not retained as R -M18. FGC has compared my markers with R-M18 samples they had.

grouza31
04-21-2016, 08:45 PM
T is the ancestral allele at 21733162. M18 is an insertion at that location, not a SNP which changes T into another base.


What can be your explanations of the following case:
BigY test result with FTNA: No read for M18 at 21733162.
BigY Bam file analysis by YFULL result : M18 T at 21733162.
Test with YSEQ result: M18 del at 21733162

VinceT
04-21-2016, 09:21 PM
What can be your explanations of the following case:
BigY test result with FTNA: No read for M18 at 21733162.
BigY Bam file analysis by YFULL result : M18 T at 21733162.
Test with YSEQ result: M18 del at 21733162

FTDNA does not call insertions or deletions from Big Y NGS data. Their programmers aren't clever or experienced enough to sort out good indels from bad indels, so they simply ignore all of them.

YFull does not call insertions accurately from NGS data, because they don't report bases that are situated between indexed positions.

YSEQ does call insertions accurately.

FGC does call insertions accurately (usually, when possible), but their method of reporting is a bit different than YSEQ, i.e. reference "T", alternate "TAA" at position 21733162 to be positive for M18. [Specifically, there is an extra "AA" inserted between the "T" at 21733162 and the "A" at 21733163]

tl;dr: Indels (particularly insertions) are tricky to call correctly from NGS data.

smal
04-21-2016, 09:29 PM
What can be your explanations of the following case:
BigY Bam file analysis by YFULL result : M18 T at 21733162.
Test with YSEQ result: M18 del at 21733162

T = del

grouza31
04-21-2016, 09:31 PM
FTDNA does not call insertions or deletions in Big Y NGS data.
YFull does not call insertions accurately in NGS data.
YSEQ does call insertions accurately.

So what result is ok? Yseq ?

VinceT
04-21-2016, 09:45 PM
So what result is ok? Yseq ?
See edit for post #104 (http://www.anthrogenica.com/showthread.php?6479-ISOGG-2016-Y-DNA-haplogroup-R1b-tree&p=152274&viewfull=1#post152274) above.

YSEQ and FGC should call M18 correctly.

VinceT
04-21-2016, 09:58 PM
Here is the dbSNP VCF format for M18+:



#CHROM POS ID REF ALT QUAL FILTER INFO
Y 21733162 rs3909 T TAA . . RS=3909;RSPOS=21733162;dbSNPBuildID=36;SSR=0;SAO=0 ;VP=0x050000080005000402000200;GENEINFO=TXLNGY:246 126;WGT=1;VC=DIV;INT;ASP;HD

Megalophias
04-21-2016, 10:45 PM
You want help interpreting your BigY results...
So you make baseless claims about the paternal ancestry of Sardinians to random people on the Internet.
In this dimension that is called a waste of freaking time.

You are M18 negative. What subclade did Y-Full put you in? What did FGC's haplogroup assignment actually say?

grouza31
04-22-2016, 08:16 AM
You want help interpreting your BigY results...
So you make baseless claims about the paternal ancestry of Sardinians to random people on the Internet.
In this dimension that is called a waste of freaking time.

You are M18 negative. What subclade did Y-Full put you in? What did FGC's haplogroup assignment actually say?
Avoids often get upset and insulting people in debates . I never asked anyone in these debates to help me interpret my test results. if I must do , I know who to ask . I try to identify inconsistencies that are often in DNA testing and interpretation. I often take the case of V88 I know and also my particular case to do so. What do you want ? People foolishly accept what you say without any spirit of contradiction and analysis ?
People say and would write a lot about genetics and particularly the V88 without knowing anything.
I never said say that my ancestors come from Sardinia . This is only a hypothesis on the origin of V88 and I'm not the first to make this assumption. There are others who say that V88 originates Africa.

Megalophias
04-22-2016, 05:48 PM
Avoids often get upset and insulting people in debates . I never asked anyone in these debates to help me interpret my test results. if I must do , I know who to ask . I try to identify inconsistencies that are often in DNA testing and interpretation. I often take the case of V88 I know and also my particular case to do so. What do you want ? People foolishly accept what you say without any spirit of contradiction and analysis ?
People say and would write a lot about genetics and particularly the V88 without knowing anything.
I never said say that my ancestors come from Sardinia . This is only a hypothesis on the origin of V88 and I'm not the first to make this assumption. There are others who say that V88 originates Africa.

Analysis is good, and identifying inconsistencies in test results is good, but many of your posts are asserting things without sufficient evidence or explanation. Instead of asking clearly you have repeated some contradiction until finally people figure out what you are talking about. This is a very frustrating way to have a discussion, I'm sure that is not your intention but that is how it comes across. (I am not saying all your posts are like this, many have been clear.) I linked the studies of Sardinian Y-DNA and explained how I came to the conclusion that M18 is probably more common, and you say "All currently available data show that the V88+ in Sardinia are more M18-" without telling us what this data is or how you are calculating your result. Instead you mention the trees of Y-Full and smal, which can hardly be relevant to calculating frequencies. You say that many people are speaking in ignorance about V88 but you do not tell us what they are saying that is wrong or why it is wrong, and you do not tell us what your hypothesis is and what evidence you have for it.

I find V88 very interesting and I would like to know what theories you (and other people) have about it and what is the evidence for and against (I have no strong opinion myself), but not in this pointlessly confusing and meandering way.

I will explain the Sardinian V88 frequency in more detail:

Contu tested for M18, he did not test for V88, because that was not even known at the time; it was Cruciani et al in 2010 who found that most of what was previous R1b* could be united by V88. But his data are still useful. Altogether Sardinian R1b-M18 was 2.1% (8/376) and R1*(xM17, M18, M269) was 1.1% (4/376). We don't know for sure what the R1* is but already we can see that even if it is *all* V88+ M18- that there is *still* more M18+ than M18-. Everything else is some other haplogroup and so can't be V88 (there is some that is not completely analyzed but that is useless for us).

Francalacci did a very large and detailed study, but the study design did not allow identification of indels, so no call for M18, but we can identify V88; altogether there was 0.9% V35 (11/1200) and 1.5% (18/1200) PF6361. We know already from Cruciani that V35 is separate from M18, but we cannot tell whether this other branch PF6361 is M18+ or -. Also there is no evidence of back mutations or recurrent mutations which could confuse the identification of V88.

Now putting them together we can see the difference between 1.1% R1* and 0.9% V35, and between 2.1% M18 and 1.5% PF6361, is not significant, and so it is likely that M18=PF6361, or at least one is a major subclade of the other. Francalacci says this, by the way, it is not just me coming up with it. But in any case in Contu's sample we know that M18+ is *at least* 67% of V88 (8/12), and in Francalacci's sample M18 is *up to* 62% (11/29) of V88, not contradicting the result of Contu. So M18 is most common in Sardinian V88.

Now if you have evidence to the contrary, I would be happy to see it.

grouza31
05-04-2016, 02:07 PM
Analysis is good, and identifying inconsistencies in test results is good, but many of your posts are asserting things without sufficient evidence or explanation. Instead of asking clearly you have repeated some contradiction until finally people figure out what you are talking about. This is a very frustrating way to have a discussion, I'm sure that is not your intention but that is how it comes across. (I am not saying all your posts are like this, many have been clear.) I linked the studies of Sardinian Y-DNA and explained how I came to the conclusion that M18 is probably more common, and you say "All currently available data show that the V88+ in Sardinia are more M18-" without telling us what this data is or how you are calculating your result. Instead you mention the trees of Y-Full and smal, which can hardly be relevant to calculating frequencies. You say that many people are speaking in ignorance about V88 but you do not tell us what they are saying that is wrong or why it is wrong, and you do not tell us what your hypothesis is and what evidence you have for it.

I find V88 very interesting and I would like to know what theories you (and other people) have about it and what is the evidence for and against (I have no strong opinion myself), but not in this pointlessly confusing and meandering way.

I will explain the Sardinian V88 frequency in more detail:

Contu tested for M18, he did not test for V88, because that was not even known at the time; it was Cruciani et al in 2010 who found that most of what was previous R1b* could be united by V88. But his data are still useful. Altogether Sardinian R1b-M18 was 2.1% (8/376) and R1*(xM17, M18, M269) was 1.1% (4/376). We don't know for sure what the R1* is but already we can see that even if it is *all* V88+ M18- that there is *still* more M18+ than M18-. Everything else is some other haplogroup and so can't be V88 (there is some that is not completely analyzed but that is useless for us).

Francalacci did a very large and detailed study, but the study design did not allow identification of indels, so no call for M18, but we can identify V88; altogether there was 0.9% V35 (11/1200) and 1.5% (18/1200) PF6361. We know already from Cruciani that V35 is separate from M18, but we cannot tell whether this other branch PF6361 is M18+ or -. Also there is no evidence of back mutations or recurrent mutations which could confuse the identification of V88.

Now putting them together we can see the difference between 1.1% R1* and 0.9% V35, and between 2.1% M18 and 1.5% PF6361, is not significant, and so it is likely that M18=PF6361, or at least one is a major subclade of the other. Francalacci says this, by the way, it is not just me coming up with it. But in any case in Contu's sample we know that M18+ is *at least* 67% of V88 (8/12), and in Francalacci's sample M18 is *up to* 62% (11/29) of V88, not contradicting the result of Contu. So M18 is most common in Sardinian V88.

Now if you have evidence to the contrary, I would be happy to see it.

I apologize if sometimes I did not clearly explain things.
It's true that I have not been clear in saying "All Currently available data show that the V88 + in Sardinia are more M18-".
I did not just talk about data of Smal or Yfull tree but also the work that led to the discovery of M18, the definition of M18, its registration at dbSNP and its use to assign haplogroups.
M18 was discovered in Phase I of the 1000 genome Project by a team of several people, including A. UNDERHILL and J.OEFNER. This work was published in 1997 under the title "Numerous Detection of Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography".

M18 (RS3909) was recorded in 1999 and updated in 2003 by OEFNER as follows:
Variation class: deletion/insertion variation
RefSNP alleles: -/AA ie del/AA
Ancestral allele: Not available

It is this classification that is currently in force in dbSNP.
On this recording, it’s not indicated that the ancestral allele of M18 is - (del) and the derived allele is AA.
The question is that how people chose after del as ancestral and derived as AA?

YSEQ cites as reference, works listed above.
First, A. UNDERHILL and all demand to interpret the results of their study with caution because drift, selection, unbalanced sampling of Individuals and populations. The authors of this study do not say - (del) is ancestral.
Second, M18 (Rs3909) is part of pseudogene TXNLGY formerly known CYorf15A which is located in the AZFb (azoospermia factoring b) region. AZFb extend from P5 palindrome (of the male specific region) to the proximal arm of P1 palindrome.
During spermatogenesis, many microdeletions arise on AZFb region by recombination between palindrome P5 and P1. This can explain the presence of deletion of M18 marker on many People. What arise often in this AZFb region of the Y chromosome is deletion not insertion. So ancestral value of marker M18 could be insertion not deletion. Sometimes the two can arise simultaneously (deletion plus insertion).

Contrary to what you say, there is evidence of recurrent mutations by X-Y gene conversion that could confused identification of the V88 marker.
Indeed, the marker V88 is in the Y male specific region characterized by a high X-Y identity, with it’s derived allele (C) equal to it’s X gametologous base. The X gametologous base of the marker V88 is C.
So V88 marker alone can not be used to define a branch of a haplotree. We need more SNPs. All studies in which only V88 marker was analyzed contain many errors and are not fair.

Megalophias
05-04-2016, 05:13 PM
M18 was discovered in Phase I of the 1000 genome Project by a team of several people, including A. UNDERHILL and J.OEFNER. This work was published in 1997 under the title "Numerous Detection of Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography"....
The question is that how people chose after del as ancestral and derived as AA?
Read the paper you cited.


During spermatogenesis, many microdeletions arise on AZFb region by recombination between palindrome P5 and P1. This can explain the presence of deletion of M18 marker on many People. What arise often in this AZFb region of the Y chromosome is deletion not insertion. So ancestral value of marker M18 could be insertion not deletion.
Again, look at the cited study. All but one of 718 men, including representatives of A, B, C, D, and E, plus at least one chimpanzee, had the ancestral value for M18. Other studies have searched extensively for M18, it is very rare and specific in its distribution, so clearly it is not something that recurs willy-nilly.


Contrary to what you say, there is evidence of recurrent mutations by X-Y gene conversion that could confused identification of the V88 marker. So V88 marker alone can not be used to define a branch of a haplotree. We need more SNPs.
Do you have any *actual evidence* that there is a problem with V88?

Let's look at Cruciani et al 2010, which first identified V88 and found it at high frequency in Central Africa. They did not test only for V88, but first for M343, then also for M18, M73, M269, M335, P25, P297, V7, V8, V35, V69, and 11 STRs. V88 shows up always on R1b-M343(xM269) background. African V88* haplotypes cluster together with V69 haplotypes, and on the Y-Full tree V69 and an African V88* full sequence form a clade just as expected.

9149


All studies in which only V88 marker was analyzed contain many errors and are not fair.
"Not fair"? It's not a competition, you aren't in the R1b World Cup against Cameroon. You haven't come up with a single error, only vague possibilities of what could go wrong.

grouza31
05-09-2016, 04:28 PM
Read the paper you cited.


Again, look at the cited study. All but one of 718 men, including representatives of A, B, C, D, and E, plus at least one chimpanzee, had the ancestral value for M18. Other studies have searched extensively for M18, it is very rare and specific in its distribution, so clearly it is not something that recurs willy-nilly.


Do you have any *actual evidence* that there is a problem with V88?

Let's look at Cruciani et al 2010, which first identified V88 and found it at high frequency in Central Africa. They did not test only for V88, but first for M343, then also for M18, M73, M269, M335, P25, P297, V7, V8, V35, V69, and 11 STRs. V88 shows up always on R1b-M343(xM269) background. African V88* haplotypes cluster together with V69 haplotypes, and on the Y-Full tree V69 and an African V88* full sequence form a clade just as expected.

9149


"Not fair"? It's not a competition, you aren't in the R1b World Cup against Cameroon. You haven't come up with a single error, only vague possibilities of what could go wrong.


Show me a single paragraph of this study in which it is stated that del is the ancestral allele of M18. I see no paragraph. The authors made a synthesis of SNPS found with their alleles. Here is an excerpt of this table.
M16 C → A
M17 → -1bp
M18 → +2bp
M19 T → A
On this table, no ancestral value is indicated for M18 unless you consider the arrow as del; if this is the case, you must also consider the arrow as del at M17 which derived allele is del. In this case, M17 will have as ancestral allele del and derived allele also del.
M18 is an indel and not a SNP. A big part of the difference between the human genome and that of the chimpanzee comes from indels. Indels in the chimpanzee did not match human indels. The nucleotide difference lenght in human indels are not the same in chimpanzee indels. You can not say that deletion of 2bp is found in chimpanzee with the same 2bp deletion. 2bp deletion at M18 is not ancestral in chimpanzee. For your information indels in chimpanzee Y chromosome are the result of insertion of chimpanzee specific-endogenous retrovirus. These chimpanzee specific endogenous retrovirus are not found in human Y chromosome.
Often, you love say that all ancestral alleles of SNPs and indels are found in chimpanzee.
The Y chromosome of chimpanzee is completely different from that of humans. The male specific region (MSY) of chimpanzee is different in structure, nature and gene content (see excerpt below. The difference is between 30 to 100% depending on the MSY area.
Chimpanzee MSY has no X transposed zone. The difference in this case is 100% with human. If you hope to find ancestral allele value of the marker V88 in chimpanzee MSY, you'll wait for millions of years without finding anything because V88 is located in X-transposed zone in humans and chimpanzee have no X-transposed zone on their Y chromosome.

9208


I talk about SNPs at V88 level when I wrote that we can not test only V88 and conclude that a person belongs to the V88 branch or not and that we need more SNPS.
You have some cases at FTDNA. Because of X-Y gene conversion at V88, many people have been tested negative to V88 and placed in the M269 branches with single SNP testing. But when tested with big Y, some had realized that they are in the sub-branch of V88 and not M269.

Megalophias
05-10-2016, 05:06 AM
On this table, no ancestral value is indicated for M18 unless you consider the arrow as del; if this is the case, you must also consider the arrow as del at M17 which derived allele is del. In this case, M17 will have as ancestral allele del and derived allele also del.
You must be actually trying quite hard in order to fail to understand this.


Because of X-Y gene conversion at V88, many people have been tested negative to V88 and placed in the M269 branches with single SNP testing. But when tested with big Y, some had realized that they are in the sub-branch of V88 and not M269.
Source for this claim? Numbers?

grouza31
06-14-2016, 10:01 AM
FTDNA has re-audited its SNPS and updated the haplotree.
Here are the data concerning a member of R-V88.
9794
The SNPS CTS10834, PF6520, PF6469, PF6470, PF6477, PF6479 and PF6419 which were before at M269 level are placed now at R-CTS623 level just below R-P297.
The SNPS PF6522 and PF6532 have been removed from M269 level. These last two SNPS are now also removed from the R-V88 member data.
The SNPS PF6464 and PF6484 which were before at R-P297 level have been removed from the tree and are now also removed from R-V88 member data.
The R-V88 member is presumed positive to SNP P297.
Is there someone who is against these new data from FTDNA ? Still, these data indicate that some members of the R-V88 branch have SNPS leading to the R-M269 branch unless the R-M269 branch is not well positioned compared to R-P297.

grouza31
08-31-2016, 12:47 PM
How can X chromosome help to build or to improve the Y haplotree?
The answer may come from this worldwide patent titled "genes in the region of the non-recombinig y chromosome" and published under the number WO1998046747 and US 6,103,886. "
The non-recombining region of Y chromosome contains genes that have their homologous on the X chromosome. The amino acid sequences of X and Y encoded proteins are from 83 to 97% identical depending on the gene type. X-homologous gene reflect the common ancestry of the X and Y chromosomes. Nearly all ancestral gene functions are deteriorated (by deletion, substitution, addition or mutation) on the non-recombining portion of Y chromosome while being maintained on the X chromosome.
What we can infer from this patent is : If we have a Y SNP and that: If this SNP is part of a gene that has an homologous on the X chromosome, one can determine the ancestral value of this Y SNP by simply analyzing the counterpart gene on the X chromosome. There is no need to search the ancestral value of this human Y SNP on the chimpanzee Y chromosome known also that the chimpanzee male specific region is totally different from that of human in structure, nature and gene content.
For example, the Indel M18 (branch R1b - V88) is part of the TXLNGY gene. This gene has a homologous on chromosome X (TXLNGX). To determine the ancestral value of M18, simply do the sequencing of TXLNG gene on the X chromosome.
Another example: the SNP M269 is part of the gene EIF1AY. This gene has a counterpart on X chromosome (EIF1AX). The amino acids sequences of EIF1AY and EIF1AX encoded proteins are 97% identical. To confirm the ancestral value of M269, analyze the sequencing of gene EIF1AX on the X chromosome.

TigerMW
10-14-2016, 01:41 AM
Have you requested that FTDNA add phylogenetic equivalents for the basal branches within R1b? I recognize that some of these branches have excessive strings of phylogenetic equivalents but I recommend requesting addition to the haplotree for all of those you feel are valid SNPs, particularly those in CombBED regions, which are predisposed to being safer anyway.

It is to our benefit that we get as many SNPs registered on the FTDNA haplotree as we can over the next several weeks.

They have several major projects going and it will be good to have any and all valid SNPs in R1b on the tree before they complete some of these projects. There is an advantage for SNPs to be on FTDNA's "known" list as well as on the haplotree. They generally only accept SNPs for the haplotree that have at least two derived instances in their database.

MitchellSince1893
10-14-2016, 03:35 AM
Have you requested that FTDNA add phylogenetic equivalents for the basal branches within R1b? I recognize that some of these branches have excessive strings of phylogenetic equivalents but I recommend requesting addition to the haplotree for all of those you feel are valid SNPs, particularly those in CombBED regions, which are predisposed to being safer anyway.

It is to our benefit that we get as many SNPs registered on the FTDNA haplotree as we can over the next several weeks.

They have several major projects going and it will be good to have any and all valid SNPs in R1b on the tree before they complete some of these projects. There is an advantage for SNPs to be on FTDNA's "known" list as well as on the haplotree. They generally only accept SNPs for the haplotree that have at least two derived instances in their database.

Just to add, FTDNA was very quick to include a new branch. Sent the request Sunday night and it was added Monday morning.

PS. Mike, I sent you an email/PM asking for some guidance.

grouza31
11-03-2016, 05:16 PM
Let's go back to the origin of the members of R-V88 in Africa, Middle East and Europe. First we will talk about the origin of some members of the R-V88 who arrived in Africa. We read everywhere that R-V88 arrived in Africa during the Neolithic period and they came from the Fertile Crescent.
These are the Eurogenes ANE K7 results of an R-V88 member:
ANE : 2.59 %
ASE : 0.77 %
WHG-UHG : 0%
East Eurasian : 0%
West African : 17.33 %
East African : 79.31%
ENF : 0%

These results are of course to be used with caution.
We note that there is no trace of ENF (Early Neolithic Farmers from Fertile Crescent) in its aDNA. But for ANE (Ancien North Eurasian) we have 2.59%. With no trace of ENF in his aDNA, we can draw two conclusions: either the migrant group to which his ancestors belonged did not come from any of the fertile crescent, or this group arrived very recently from North Eurasia, went through Fertile Crescent and then to Africa (but they did not interbreeded in Fertile Crescent).
If these results of Eurogenes ANE K7 are somewhat true, the origin of many people of the R-V88 branch and its sub-branches in R1b haplotree can be questioned.

grouza31
07-05-2017, 03:48 PM
This is the migration path of the R-Y18458 branch of V88 : From Anatolia to Syria, then from Syria to Medina (Saudi Arabia), then from Medina to Mecca (Saudi Arabia). From Mecca (Saudi Arabia), part of this branch has gone into Africa. The arrival in Africa is made by Egypt. This branch arrived in Egypt about the year 750 after Jesus Christ. From Egypt, this branch went to Libya, then from Libya to central Africa (Chad, Cameroon, Nigeria). When they arrived in Africa, they were reddish and were not Muslim. Part of this group (aristocracy) converted to Islam in the 13th century. Here are some of their customs and traditions before their conversion to Islam:
- they descend from their mother and not from the father,
- Every year they kill animals and put blood on the doors of their house and their city,
- They have a certain adoration for the snake,
- They attached importance to a very sacred object that they say they have inherited from their ancestor King SAUL.
They like to name the places they occupy by the name of their place of origin. This is the case, for example, with the name Gabala, which is found in Syria, Africa, and Caucasus.
On the other hand, there is a genetic link between this group and some Irish. Are there some Irish people who come from anatolia or the Middle East and who are V88?

Theory2.5
04-02-2019, 03:26 PM
This is the migration path of the R-Y18458 branch of V88 : From Anatolia to Syria, then from Syria to Medina (Saudi Arabia), then from Medina to Mecca (Saudi Arabia). From Mecca (Saudi Arabia), part of this branch has gone into Africa. The arrival in Africa is made by Egypt. This branch arrived in Egypt about the year 750 after Jesus Christ. From Egypt, this branch went to Libya, then from Libya to central Africa (Chad, Cameroon, Nigeria). When they arrived in Africa, they were reddish and were not Muslim. Part of this group (aristocracy) converted to Islam in the 13th century. Here are some of their customs and traditions before their conversion to Islam:
- they descend from their mother and not from the father,
- Every year they kill animals and put blood on the doors of their house and their city,
- They have a certain adoration for the snake,
- They attached importance to a very sacred object that they say they have inherited from their ancestor King SAUL.
They like to name the places they occupy by the name of their place of origin. This is the case, for example, with the name Gabala, which is found in Syria, Africa, and Caucasus.


What is the name of this people? Are there any webpages telling their story that you could reference for people wanting to learn more about them?