PDA

View Full Version : Analysis of Klyosov's Methodology



jeanL
07-09-2013, 06:09 PM
Anatole Klyosov refers to this paper: Mutation Rate Constants in DNA Genealogy (Y Chromosome) (http://www.scirp.org/Journal/PaperInformation.aspx?paperID=8688) as the paper that explains his methodology more thoroughly. So I thought it would be good to look into the context, since he often claims that his methodology has been "cross-verified using thousands of haplotypes from different haplogroups".

In this paper Klyosov states in the abstract that:




The basic principles of DNA genealogy and the mutation rate constants for haplotypes of Y chromosome are considered. They are exemplified with 3160 haplotypes, 2489 of those in the 67 marker format, with 55 DNA lineages, 11 of them having documented confirmed common ancestors. In total, they cover 8 haplogroups and the time range from 225 to ca. 8000 years before present. A series (including 67 marker, 37 marker, 25 marker, 16 marker mostly of the Y filer haplotype panel, 12 marker, as well as the “slowest” 22 marker and its subset of 6 marker haplotypes) were calibrated using documented genealogies (with a number of lineages which allegedly descended from some legendary and/or mythical historical figures that were examined and verified employing the calibration plots). The study principally confirms a number of previously made or assumed theoretical foundations of DNA genealogy, such as a postulated stochastic character of mutations in non-recombining parts of DNA, the first-order kinetics of mutations in the DNA, the same values of the mutation rate constants for different haplogroups and lineages, and the principles of calculating timespans to the most recent common ancestors taking into account corrections for back (reverse) mutations.

He goes further in terms of what his first order kinetic assumption means:



Mutations in the STRs occur as shortening or lengthening of the respective chain by (commonly) one repeat unit, along with much more rare events of change by several units (multi-step mutation), deletion, or duplication of the whole marker or its parts. All carefully done and reliable studies (including those on father-son pairs) indicate that the mutations occur randomly, and they do not depend on a particular haplogroup, a population, a race, or a time period, whether it happened recently or a long time before present. All studies which claim otherwise have turned out to be methodologically flawed. These include studies that mixed different DNA-lineages, mixed different populations, haplogroups, etc. In brief, DNA genealogy is based on the con-cept of a so-called molecular clock, i.e. on the fact that average rates of mutations in haplotypes are practically constant for millions of years. They do not depend noticeably on any external factor (such as climate, solar radiation, diet, etc.) and they do obey the first order kinetics.

I bolded two important parts:

The first part is the claim made by Klyosov that all studies hitherto which claim a methodology different from him have been methodologically flawed, yet he offers no reference, he simply says that it was due to mixed different DNA-lineages, mixed different populations, haplogroups, etc. Well, he just said a few sentences before that, that the mutation rate doesn't depend on "a particular haplogroup, a population, race or time period".

The second part is where he makes it clear that the mutation rates are constant regardless of events such as climate, solar radiation, diet. This is a key point in his assumptions and methodology, since mutation rates are indeed a function of the step-size, but let's just focus on his paper.

In Figure-1 Klyosov shows the calibration plots from where the mutation rate of 0.00183 mutations/market/generation was obtained using FTDNA Genealogical Projects.

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosovetal2011-Figure-1amp2_zps949f0a45.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosovetal2011-Figure-1amp2_zps949f0a45.jpg.html)


Now the first thing that I notice, which isn't erroneous per se, but does make the reader question the data, is the fact that in Figure-1 the reader can see a total of 16 data points with their respective confidence intervals which were used to do the calibration plot. Yet on Table-1 where the genealogical projects information is provided we see only 11 pedigrees with known genealogical common ancestry.

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosovetal2011-Table-1_zps4d410cdd.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosovetal2011-Table-1_zps4d410cdd.jpg.html)

So there seems to be 5 extra points that are used in the calibration process where the ancestor is not known genealogically, or at least the data was not provided. Looking at Table-1 one can see that there is a dominance of R1a1a1 haplogroups in the Table, with them make up 5/11 lineages. The number of participants in some projects namely the R1a1a1g at 7, the R1b1a2a1b at 11 and N1c1d at 4 seems rather very small, yet the latter two projects make up 50% of the data used with genealogically known MRCA in the timeframe of 20-40 conditional generations.

For completeness I will include Figures 2,3,6,7,8, but for now will mostly focus in Figure-1, which is the calibration dataset.

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosovetal2011-Figure-3amp4_zpsf9fb5cb7.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosovetal2011-Figure-3amp4_zpsf9fb5cb7.jpg.html)

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosovetal2011-Figure-6amp7_zpsfa7c0e63.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosovetal2011-Figure-6amp7_zpsfa7c0e63.jpg.html)

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosovetal2011-Figure-8_zps24e4b1ef.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosovetal2011-Figure-8_zps24e4b1ef.jpg.html)

The only question a reader could have is if there appears to be some datapoints in the timeframe well beyond the calibration time frame why it says in that these are y-chromosome lineages with well defined-common ancestors, and also given that some of them deviate from the trendline it seems as if they were calculated using the different methodology, but it doesn’t say much about it, perhaps they were calculated using a different STR format, and then the calculation was repeated using 37-STRs, in any case the idea that their common ancestor is well defined defies the logic, since it is not really known when the common ancestor lived but instead was estimated using his methodology.

jeanL
07-09-2013, 06:40 PM
Now onto figure-1, I was able to reproduce the figure using Excel by carefully measuring the data point coordinates, for the purpose of the exercise, I did not include confidence intervals, but they could in fact be included if needed.

Here is a copy of his Figure-1.

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosov-AnalysisFigure-1_zpsf6d7d372.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosov-AnalysisFigure-1_zpsf6d7d372.jpg.html)

Everything agrees we see the slope of the line is 0.00183, and the R2 value is 0.901, which yields an R value of 0.95.

The first thing I noticed was that there were five datapoints which were increasing the correlation significantly, namely these points:

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosov-AnalysisFigure-1-points_zps2d4b6094.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosov-AnalysisFigure-1-points_zps2d4b6094.jpg.html)

They seem to come from 5 pedigrees having MRCA in the timeframes of ~ 12 (300 ybp) conditional generations, 15 (375 ybp) conditional generations, and 27-28 (675-700 ybp) conditional generations. Looking at Table-1:

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosovetal2011-Table-1_zps4d410cdd.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosovetal2011-Table-1_zps4d410cdd.jpg.html)

It seems at least one of these 5 pedigrees is that of the known MacDonald clan. So in any case, I went ahead in removed these points to see what the fit would look like with the remaining 11 pedigrees.

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosov-AnalysisFigure-1-datapointsremoved_zps91efbe00.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosov-AnalysisFigure-1-datapointsremoved_zps91efbe00.jpg.html)
Now it seems the R2 value decreased to 0.8639, which translate into an R value of 0.93. What is interesting is that a 2nd order polynomial fit seems to provide a better fit for both cases where all the datapoints are used, and where all 5 of the points giving “good fit” are removed.

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosov-AnalysisFigure-1-Polyfit_zpsd3427dd6.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosov-AnalysisFigure-1-Polyfit_zpsd3427dd6.jpg.html)

This yields an R value of 0.9586 using all datapoints.

http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosov-AnalysisFigure-1-datapointsremovedpolyfit_zps57739693.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosov-AnalysisFigure-1-datapointsremovedpolyfit_zps57739693.jpg.html)

This yields an R value of 0.9461 using all but the 5 datapoints removed.

As I increased the polynomial degree I saw a better fit(data not shown) as demonstrated by the R2 value.

Now why would someone use a linear approach instead of a polynomial, when it is clear than the polynomial provides a better fit?

Something else to take account of, is that in the calibration timeframe many of the slow markers yielded zero mutations, which might account for the relative similarity in the mutation rates for the different sets displayed on figure-1. This is observed in figure-7, where one can see many of the empty squares at the zero mutations points.
http://i1133.photobucket.com/albums/m582/jeanlohizun/Klyosovetal2011-Figure-6amp7_zpsfa7c0e63.jpg (http://s1133.photobucket.com/user/jeanlohizun/media/Klyosovetal2011-Figure-6amp7_zpsfa7c0e63.jpg.html)

That’s all I have so far….

jeanL
07-15-2013, 07:28 PM
Having asked for permission to Keith Britton I will post his analysis of Klyosov's paper which he sent me via email, I think it is highly informative and throughout:




My approach was slighter than yours, as I'm an antique, not inclined to put out the effort to digitize the data from the plots and not equipped with current software and and expertise to simplify that. I was also expecting to speak to mostly people with limited familiarity with probability/statistics. On the other hand, I go back to when trivial application of formulae didn't obscure the actual principles involved, or introduce apparent and unsupported precision. Much is best displayed graphically, if one understands what plots mean.

Read the text in the paper regarding Figure 2. He extends his Figure 1 data by adding 6 points, all pairs. Two doublets are alternative, i.e. mutually exclusive sources for a single event. The remaining doublet denote the extremes of a range for a single event. Then, and critical for condemnation in respect to integrity or workmanship, he gives improved values for slope and correlation. Note that the caption is worded to imply that the points from Figure 1 are not used, only included for comparison. If so, then he bases his fit and correlation on just three events, properly three points...

If he uses all six points, then he is guilty of using at least two which must be known to be erroneous to "improve" his results, plus the two which badly represent the single point with maximum moment on his slope. Crudely, half his added data is "known crap", and when mutually exclusive genealogical claims clearly exist, logically both may be erroneous. He did have the valid option of using the data in a manner which indicated its limitations, but he cannot justify numerically doubling its weightings. At first glance, this impugns his integrity, but he is open and specific about what he did. It follows that a more likely explanation is that he is simply grossly ignorant of statistical methodology and so prone to lousy workmanship. Probably a better question is how something this bad got past peer review.

A first question is whether data can be linearized, since if it can't, statistics is out the door. Immediately following that, is the question of the correct form of the fitting equation. Mostly, both questions are illuminated by trialling options and then investigating the one with the most tractable residuals. It takes but a glance to see if variance in Y is a function of X. If so, the plot is wedge shaped and, if a log plot doesn't clear that up, you need another term in the equation. Data for Figure 1 is not wedge shaped but is questionable because of clustering. That of Figure 2 appears OK, but that's because of the doubling and visual inclusion of the Figure 1 data. Figure 7 data is grossly wedge shaped, Figure 8 grotesquely so. Figures 3-6 are more interesting and require closer inspection.

Figure 3 appears slightly wedge shaped, Figure 4 more so but perhaps more trumpet shaped and/or with perhaps curvature. Figure 5 appears very gently wedge shaped. Closer study of Figure 3 suggests a domain change at about x=200, repeated in Figure 4. Here, judgement or sophisticated testing is needed to distinguish between a multifactorial case where a term slowly begins to assert itself to dominate late versus a domain situation where the equation forms differ, possibly by introduction of a new factor. Close comparison between the plots shows that variance is notably worse for Figure 4, with an interesting shift of the mean below the line until x=200. That's pretty conclusive in supporting the domain change case. The obvious quick test is to fit to x=200, fit x=200 to x=350, and compare both visually, by equation and by Correlation. Trickier, is the higher domain curved, and if so, what is the equation needed to linearize it? Figure 5 is transitional. Separate domains might be present, but they are visually unremarkable compared to the preceding, except in one respect. Notably, the "comparison" group have deteriorated in variance.

Figures 6 and 7 are also interesting, individually and in comparison. The former initially looks simple linear, if with rather wide variance, but a closer look shows that some violence has been done to the formerly tidy "comparison" group, not just by increased variance but, if fitted separately, a clearly different slope. What does one see if the filled squares are also separately fitted (omitting the implied point at the origin)? It gets drastically worse with Figure 6. The "comparisons" go to garbage and the x=200 break may be back, a fit to all solid points probably wouldn't pass through the origin, probably neither would a fit for solids below x=200, and a fit to those above (wedge shaped) would further diverge in slope, probably introduce curvature not just an inflection, and likely require another term in the equation.

So is all this real? Well, yes and no. It's real insofar as it's analysis of the data as displayed in the figures, but it does not consider artifact in that data's origin, nor do equations recovered necessarily reflect the mathematics of the processes underlying mutation. Again there's artifact, so information recovered may largely, or principally, reflect the mathematics implicit in the authors' methodology.

Stepping back to the beginning, this all starts with an empirical fit to the "comparison" data. IF the data are good and the doublets resolved, it's a good linear fit, mutation steps against time. But one must not lose sight of the intrinsic nature of an empirical fit (though it goes by in the first chapter or so of an elementary Statistics text book). A demonstrated fit is logically unassailable and may be relied upon within its domain. NO causal relationship has been established or is required. NO extrapolation beyond the observed domain is safe. ANY change within the domain destroys it, requiring re-evaluation. Axes are equivalent, as there is no intrinsic causal relationship, so X and Y have no conventional meaning. Coefficient of Correlation is valid, but measures of dispersion are in the province of causally related data. It follows that the Klyosov method should work, and reasonably well, to 1,000 b.p., and probably to 2,500 b.p. Extrapolation beyond that really requires causal relationship, and, within that span, that conditions for breeding, notably generation time, remained stable. Less obvious, sampling must be random and not systematically affected by time.

The authors had to move to a causal relationship where the equations had meaning regarding the processes of mutation. Unfortunately, Klyosov elected to place time on his X axis, compounding that by measuring time in "Conditional Generations", thus successfully confusing time with generations as understood in genetics. Statistical treatment is built on the concept of dependent and independent variables, the latter conventionally assigned to the X axis.

To use his equations, the independent variable is the number of observed mutation steps, TMCRA, i.e. time, being dependent on that. It follows that what we want to see is the variance in time versus observed data - the number of mutation steps. It further follows that the variance shown in Figure 1, by (dubious) error bars, is actually variance in conventional X, the independent variable. That blows away common statistical analysis, because x values are not accurately known.

Not content with this, the authors further confuse or convolve things by systematically processing that x data before use. Statistical methodology requires raw data. A fit to raw data will produce an equation; a fit to the same data as manipulated will probably produce the same equation plus one or more terms reflecting the mathematics of the manipulation.

Along the same lines, whence do the authors' later time data come from? I may have missed it, but the text suggests calculation from the initially established equation. There's an unsubtle circularity in that approaching chutzpah - or simple gross ignorance of basic procedure. It's certainly adequate explanation for the persistence of the slope of the initial empirical fit in extrapolation beyond its domain. What else do we, and should we, see with a cursory glance at these data?


To be continued....

jeanL
07-15-2013, 07:29 PM
This is the second part of it.



If mutation rates are indeed various and strictly stochastic, then the principles of the Law of Large Numbers apply - with caveats. The Central Limit Theorem is relevant, and the larger the numbers of markers used, the more consistent will be the the mean and distribution backing the derived "mutation rate". That said, two variables are involved. Time is infinitely variable, but stepwise mutations are binomial, so only an approximation to a "normal" distribution is possible. It's actually worse than that, because individual marker's mutation rates are fixed (per definition above), introducing a higher order binomial factor. Further, we have to consider both the number of samples and the size of the sampled population, plus complication from systematic and random sampling difficulties.


Where numbers are few, in samples or population, variance increases. Where sampling selects for a higher proportion of "slow" markers, drastically fewer samples are available for shorter times, hence wrecking the "comparison" group. (This logically denies established original empirical fit for those markers and implies systematic differential effects for later times, see below.) The author's "correction for back mutation" is binomial and systematically increasingly important with increasing time and faster markers, so should be visible as a tardily apparent component, perhaps with curvature and increase in variance.


Then there is the authors' designation of all this as "a first order" process, with reference to such as understood in chemistry. If not asserted as a (poor) analogy, it presumably stems from ignorance. The essence of first order chemical reactions is that the reacting elements are identical and the process thus rate limited only by the remaining concentration. Rate decay is thus exponential with time. The authors assume exactly that in their "log method", markers exhibiting mutation being progressively removed from the sampled population. If all markers had the same mutation rate, that would be valid, but they don't, so the curve is cannot be exponential with time. Consider mutation on five markers with monotonically decreasing mutation rates. From probability, the two fastest will mutate with a mean time significantly shorter than that of the two slowest, i.e. the mean mutation rate for the population will systematically tend to increase[[[should read "decrease" ]]] with every event.


As noted above regarding "slow" markers, systematic effects are to be expected with increasing time. A "back mutation" correction may or may not prove non-linear, but again, these are not infinitely variable data but binomial in character, and numbers matter. As slower markers are brought into play, their impact is not proportional but abrupt and stepwise, few in number but disproportionately contributing to variance.

[[[Mikewww/Moderator: Correction per JeanL post #5 ]]]

jeanL
07-26-2013, 04:39 PM
For some reason I cannot edit my entries, I have been informed that there was a typo above:




Then there is the authors' designation of all this as "a first order" process, with reference to such as understood in chemistry. If not asserted as a (poor) analogy, it presumably stems from ignorance. The essence of first order chemical reactions is that the reacting elements are identical and the process thus rate limited only by the remaining concentration. Rate decay is thus exponential with time. The authors assume exactly that in their "log method", markers exhibiting mutation being progressively removed from the sampled population. If all markers had the same mutation rate, that would be valid, but they don't, so the curve is cannot be exponential with time. Consider mutation on five markers with monotonically decreasing mutation rates. From probability, the two fastest will mutate with a mean time significantly shorter than that of the two slowest, i.e. the mean mutation rate for the population will systematically tend to increase(this should read "decrease") with every event.

TigerMW
09-30-2013, 05:04 PM
Now onto figure-1, I was able to reproduce the figure using Excel by carefully measuring the data point coordinates, for the purpose of the exercise, I did not include confidence intervals, but they could in fact be included if needed.
....
Everything agrees we see the slope of the line is 0.00183, and the R2 value is 0.901, which yields an R value of 0.95.

The first thing I noticed was that there were five datapoints which were increasing the correlation significantly,
...
They seem to come from 5 pedigrees having MRCA in the timeframes of ~ 12 (300 ybp) conditional generations, 15 (375 ybp) conditional generations, and 27-28 (675-700 ybp) conditional generations.
...
It seems at least one of these 5 pedigrees is that of the known MacDonald clan. So in any case, I went ahead in removed these points to see what the fit would look like with the remaining 11 pedigrees.
...
Now it seems the R2 value decreased to 0.8639, which translate into an R value of 0.93. What is interesting is that a 2nd order polynomial fit seems to provide a better fit for both cases where all the datapoints are used, and where all 5 of the points giving “good fit” are removed
...
This yields an R value of 0.9586 using all datapoints.
...
This yields an R value of 0.9461 using all but the 5 datapoints removed.

As I increased the polynomial degree I saw a better fit(data not shown) as demonstrated by the R2 value.

Now why would someone use a linear approach instead of a polynomial, when it is clear than the polynomial provides a better fit?

Something else to take account of, is that in the calibration timeframe many of the slow markers yielded zero mutations, which might account for the relative similarity in the mutation rates for the different sets displayed on figure-1. This is observed in figure-7, where one can see many of the empty squares at the zero mutations points.
...
That’s all I have so far….

Does the use of a linear method versus a polynomial method, significantly extort the resulting estimates?
Are there examples of polynomial based methods out there? How do they compare in the resulting estimates?

What's the net effect of using the slow markers with zero mutations? Are you saying they are biasing the estimates? If the target data set is large enough to have mutations on those slow markers, does that resolve the issue? and what's large enough?

GailT
10-01-2013, 05:18 AM
At first glance, this impugns his integrity, but he is open and specific about what he did. It follows that a more likely explanation is that he is simply grossly ignorant of statistical methodology and so prone to lousy workmanship. Probably a better question is how something this bad got past peer review.

His papers are not peer reviewed, they are self-published or published in an open access "journal" for which he is the editor. His pseudo-science would never be published in a legitimate journal.

MJost
10-01-2013, 01:21 PM
I am not really in this this fight but it seems to me that this thread has turned into a blood feud, a character an assassination process instead of a good discussion on Anatole Klyosov's methods. There has been some solid critique(s).

Of those who are partipates as DNA genetic researchers and/or number crunchers and/or statistical experts who post on the various genealogical forums being or requesting to be reviewed, peer or other wise?

Are or have these DNA genetic researchers people been peer reviewed, Kenneth Nordtvedt, Charles F. Kerchner, Jr., John F. Chandler, Doug McDonald, ect, but are members of the growing genetic genealogy community have been credited with making useful contributions to knowledge in the field?

Are other publication sites such as The Journal of Genetic Genealogy (www.jogg.info) considered a Peer Review arm? Yes reviewers are involved in a peer review publishing process. I believe Anatole Klyosov had published "His pseudo-science would never be published in a legitimate journal."

MJost

GailT
10-01-2013, 06:50 PM
This is not a "blood feud". Anatole makes extraordinary claims that are not backed up by sound science. He claims that every study showing African origins of y-DNA and mtDNA published in the last 25 years is wrong. He insults anyone who points out the flaws in his method, he appears not to understand the most basic aspects of phylogeny, and he attempts to gain credibility by self-publishing his analysis. I understand that some people like his STR method for estimating dates for very recent paternal ancestry. Various methods for STR analysis give similar results, and all have some uncertainty. I'm really not interested in his STR analysis, but it worries me that a large number of people accept his complete rejection of the last 25 years of genomics research on human origins because they like his STR analysis.

MJost
10-01-2013, 07:17 PM
You know as well as I do, that the Researchers in the Education realm tear each others research papers apart bit by bit when theories are not main stream. AK does have his own non-African origin theories but most of the readers do not agree and most phylogeny facts today do not support his position(s), and as you suggest, are not accepted by a "large number of people" although their appears to be a small but solid group that do. Anatole's believe, if I have it correct, is derived from where the various branches appear on the tree. Is he or is he not adjusting his analysis to a racist standard? I dont believe it one bit.

Back to self publising, AK's doing so is no different than anyone else posting their own online links to documents and work examples as other citizen genetic researchers have and continue to do. These people are usually years ahead of the traditional educational based genetic researchers. AK's basic methods have been published several times on Isogg, a peer review genetic researcher web site.

MJost

GailT
10-01-2013, 10:51 PM
When I say a "large number of people" accept AK's fringe theories, I'm speaking purely of the amateur and genetic genealogist community, where he has a significant following. And intentionally or not, he provides cover for people who, for ideological reasons, refuse to accept the modified Out of Africa theory.

Regarding y-DNA and mtDNA phylogeny, no one in the scientific community accepts AK's fringe theories, and I doubt that academic researchers even know about AK's work, just as most academics are probably unaware of a recent self published paper on big foot DNA. Who has time to study and refute every unscientific conspiracy theory?

AK makes statements about y-DNA phylogeny that are very obviously, objectively, factually incorrect. If you read the papers he has written on ancient human origins and prehistoric migrations (and his response to criticism on the genealogy-dna list), I think there are only two possible conclusions: either he doesn't understand science, or he has some other agenda.

Again, I have no problem with his STR analysis. Many other people have done STR analysis that is about as accurate or uncertain as his. But as far as I know, no one else has tried to use their STR analysis to assert that every scientific study of uniparental DNA, from 1987 to the present, is wrong.

MJost
10-02-2013, 02:56 PM
Using my TRMCA Estimator and Anatole Klyosov's Linear Method, which is built into my speadsheet, for 8,882 L21 HTs provides an Age in generations.

IntraCladeCoalescence(n-1)Age______MeanGenerations___StdDevInGen
Clade A:R1b>L21-67M_____________120.7____________35.7

Intraclade Founder's Modal Age______ModalGenAge______StdDevInGen
Clade A:R1b>L21-67M_____________133.2____________37.5

AnatoleKlyosov'sLinearMethod_______Gen>adj__________Mutations
AKRatesLinearCladeA-Rate0.12______114>129__________121,174

MJost

parasar
10-02-2013, 05:20 PM
...

AK makes statements about y-DNA phylogeny that are very obviously, objectively, factually incorrect.
...

Behaviorally and technologically modern humans (MH - here construed as one the one that replaced the Neanderthal in Eurasia) spread about 60000ybp (perhaps after YTT). To me it looks like humans moved out of Africa about 140000 years back based on highly divergent A lines present in Africa. It just not make sense if the dispersal was 60000ybp from Africa that they reached east Asia and Australia almost instantaneously. Therefore a convincing argument can be made that MH origin is not necessarily African but that MH could have spread to Africa at about the same time or even later than MH spreading to the rest of the world.

http://www.pnas.org/content/103/25/9381.full


We now know from studies of both the DNA patterning of present-day world populations and surviving skeletal remains that populations that were essentially “modern” in both a genetic and an anatomical sense had emerged in Africa by at least 150,000 years ago (1–7). We also know that these populations had dispersed from Africa to most other parts of the world by at least 40,000 years ago


The clear implication of these finds is that, whilst the human populations represented at Skhul and Qafzeh were essentially modern in both anatomical terms and in terms of clearly symbolic behavioral patterns, the levels of technology associated with these populations were still of strictly archaic, Middle Palaeolithic form (71, 72). Viewed in these terms, it is equally interesting that the early incursion of these anatomically modern populations into southwest Asia seems to have been a very localized and short-lived event, apparently confined to this southwest Asian region, and followed by a reestablishment of the earlier Neanderthal populations within these regions from at least 70,000 B.P. onwards, as reflected by the typically Neanderthal remains recovered from the later Mousterian levels at the Kebara cave, Tabun, Amud and Shanidar (1, 71, 72, 78). In other words, it would seem that whatever the intellectual and symbolic capacities of these early anatomically modern populations, their levels of technological and socioeconomic organization were not sufficient to withstand competition from the long-established Neanderthal populations of Eurasia during the later (and colder) stages of the Middle Palaeolithic sequence


These point strongly to the conclusion that there was only a single (successful) dispersal event out of Africa, represented exclusively by members of the L3 lineage and probably carried by a relatively small number of at most a few hundred colonists (2, 8, 28, 97). This lineage rapidly diversified into the derivative M, N, and R lineages, which are particularly well represented in modern Asian populations and which are estimated to have arrived and diversified further in southern Asia by at least 50,000 B.P. and possibly as early as 65,000 B.P. in Malaysia and the Andaman islands (8, 9, 28, 97). A similar conclusion has been drawn from recent studies of the Y chromosome evidence (97). This evidence would also conform well with the clear peak in the mtDNA distributions of Asian populations, dated broadly to ≈60,000 B.P. (23–25) (Fig. 1). This model, of course, would mean that the subsequent dispersals of anatomically and behaviorally modern populations into southwest Asia and Europe must have reached these areas substantially later, via western or central Asia (2, 8, 97).

The main problem posed by this scenario at present lies in the sparsity of well documented and well dated archaeological evidence for the early modern human colonization of Asia prior to ca.45,000 B.P., when we know that early colonists had reached parts of northern and southern Australia, best represented by the archaeological and skeletal finds from Lake Mungo in New South Wales

GailT
10-02-2013, 06:52 PM
Behaviorally and technologically modern humans (MH - here construed as one the one that replaced the Neanderthal in Eurasia) spread about 60000ybp (perhaps after YTT). To me it looks like humans moved out of Africa about 140000 years back based on highly divergent A lines present in Africa. It just not make sense if the dispersal was 60000ybp from Africa that they reached east Asia and Australia almost instantaneously.


I think there is some uncertainty in when the Out of Africa migration(s) occurred, and Dienekes has blogged on some of the research that might support an earlier date for OoA (perhaps closer to 100,000 ybp rather than 60,000 ybp). Also, I think it is possible that there could have been multiple waves of OoA, perhaps with earlier OoA expansions that did not leave a record in modern mtDNA and y-DNA. Perhaps a large 60,000 ybp migration could have interbred with a small population from a 100,000 ybp OoA migration, just as they interbred with Neanderthals and Denisovans. So yes, I think there is still a lot of uncertainty and more research is needed to fully understand OoA and mixing that may have occiurred outside of Africa with archaic humans, and perhaps, early AMH.

But the data currently available for modern y-DNA and mtDNA clearly show that the deepest roots of both trees are located in Africa. To argue the opposite, as AK does for y-DNA, ignores all of the research on mtDNA and y-DNA completed in the last 25 years. There is an overwhelming amount of modern y-DNA and mtDNA data that shows a recent expansion out of Africa.




Therefore a convincing argument can be made that MH origin is not necessarily African but that MH could have spread to Africa at about the same time or even later than MH spreading to the rest of the world.


I don't see how you reach this conclusion. If the age for AMH is around 200,000 years, and you have diverse modern y-DNA and mtDNA found in Africa in the range from at least 200,000 to 60,000 ybp, you clearly have AMH primarily in Africa for at least two-thirds of its history. You would have to explain the lack of diversity in modern y-DNA and mtDNA outside of Africa if you argue that AMH expanded from Eurasia to Africa.

parasar
10-02-2013, 08:57 PM
...

I don't see how you reach this conclusion. If the age for AMH is around 200,000 years, and you have diverse modern y-DNA and mtDNA found in Africa in the range from at least 200,000 to 60,000 ybp, you clearly have AMH primarily in Africa for at least two-thirds of its history. You would have to explain the lack of diversity in modern y-DNA and mtDNA outside of Africa if you argue that AMH expanded from Eurasia to Africa.

The vast bulk of Y DNA in Africa is under DE whose place of origin is not known. Which means that if DE originated elsewhere, the diversity in Africa is enhanced due to the divergent lines exclusive to Africa and the ones that came later. The divergence of these Africa exclusive Y lines from the rest of Y is immense.

Even on the mtDNA side there is an almost absolute disconnect between South Asia (except in populations such as the Siddi with historically known African provenance) and Africa.

Finally, the archaeological evidence is missing in South Asia. Sure it may be found, and that would change things, but as of now gracile AMH is first seen in Australia. LM3 is quite different from later Australians (who may have had multiple sources too). LM3 also shows strong cultural processes with respect to burial as the red ochre used in his burial must have been transported from a significant distance (cf. pg 235, pg 256 Bones and Ochre: The Curious Afterlife of the Red Lady of Paviland
By Marianne Sommer http://books.google.co.uk/books?id=n7-BHoeyStgC&pg=PA17 Red Lady of Paviland - his bones have red ochre too).

So to me it appears that human ancestors left Africa very early leaving some lines to develop in isolation there. At least two groups develop in isolation for a long period of time - one in Africa and perhaps at least one other in SE/E Asia. These latter, descendants of the early African exodus and YTT survivors, overwhelm the Neanderthal in Eurasia, spread all over Eurasia, and enter Africa.

jeanL
10-05-2013, 12:52 AM
Does the use of a linear method versus a polynomial method, significantly extort the resulting estimates?
Are there examples of polynomial based methods out there? How do they compare in the resulting estimates?

Well the linear method assumes that the slope of the line is the mutation rate, and hence it is constant. A polynomial fit allows for the mutation rate to vary as a function of the repeat number. If you want to read up about the relationship between mutability and repeat number, look it up on google scholar as there are plenty of studies that show the relationship between both.



What's the net effect of using the slow markers with zero mutations? Are you saying they are biasing the estimates? If the target data set is large enough to have mutations on those slow markers, does that resolve the issue? and what's large enough?

I'm not sure what you mean, I wrote this a while ago, could you point me to what you mean especifically??

jeanL
05-27-2014, 04:13 PM
After careful review of Klyosov 2009a paper, and his 2011 paper, I have come to notice that a lot of these calibrated TMRCA underwent great lengths of data processing, which pretty much defeats the purpose of using random data. Klyosov correction formula for back mutations:

Lamba=Lambaobs/2(1+exp(Lambaobs)) for completely symmetrical

Is pretty much a fudge factor, is a way for him to manipulate the data to fit his calculations. STRs trees aren't symmetric for the mere reason that the mutation rate itself increases with repeat length, so there is a bias for up mutations vs down mutations. Also none of his methods, be it the mutation counting method, nor the logarithmic method are backed by any other methodology, be it modeling mutations as discrete bernoulli distributions, or continuous poisson distributions.

All STRs mutation rates are subject to a constrain, any TMRCA older than 1/mu, where mu is the mutation rate, is due to be underestimated, using the ASD methodology.